27 bugs, 8 features, zero regressions
Big session. Ran a full end-to-end audit of the entire app, closed 27 bugs, then started on 8 features pulled from competitive research. 2,523 tests passing at the end.
The security ones matter most. Three sandbox escape paths in GrepTool, GlobTool, and DiffFileTool -- arguments weren't being re-anchored to the workspace root, so a carefully crafted path could read outside it. Command injection in RunTestsTool: test arguments flowed straight into the shell. All four closed before anything touched a release tag.
The "why is this broken" ones. Image attachments via the file picker rejected every image as "binary." Chat mode's agent was going rogue with docs attached: trying code tools, burning iterations, then hallucinating a fake answer. Research mode was firing web searches against the embedded doc content instead of the actual question. All three were integration bugs that passed unit tests but fell apart under real usage.
The off-by-ones. Circuit breaker was tripping at exactly 80% of budget, killing valid conversations at the threshold. Budget enforcement blocked at the limit instead of when exceeded. Air-gap mode wasn't disabling the cloud boost toggle in the UI -- it was enforcing air-gap at the network layer, but the toggle was still live. These are the bugs that make people lose trust. Fixed.
Onboarding completion never persisted, so the checklist reappeared every restart. Session deletion orphaned memory entries. The memory route crashed on corrupt JSON. The write mutex had no timeout and could hang the app forever. Plus 363 lines of dead CSS from themes we removed weeks ago. And 12 more across the stack.
What I shipped on top of that:
- Chat mode now has read-only code tools (grep, glob, file read, code search). You can ask the model about your codebase without switching to Code mode. This was the #1 feature request after the last audit.
- FIM inline completions now default ON. They were behind an opt-in toggle that nobody found. Better first-run experience.
- Smart tool stripping. Some small models repeatedly try to call tools that are blocked for their context. After a few failures, the tool list gets removed entirely and the model is forced to respond as text. Reduces the "agent flailing" loop that kills small-model usability.
- Settings progressive disclosure. New users see Theme, Models, Profile, Boost. That's it. Everything advanced lives behind a toggle. The settings page was turning into a wall.
Next session: cloud provider as primary (not just boost -- for users without a local GPU), auto-verify after code changes (run tests and build, feed errors back), panel hand-off buttons ("Fix this" from Debug or Research jumps into Agent), and conversation export (JSON + ZIP backup).