Anthropic’s Claude Code CLI briefly found itself in an unusual spotlight after its source code was published by mistake, and a new deep dive digs into what that code reveals about how the tool actually works. In a GitHub gist, Haseeb Qureshi walks through the internals of Claude Code’s “harness”—the scaffolding around the model that makes the agent feel responsive, resilient, and usable in long-running coding sessions—while repeatedly contrasting the same design problems as solved in OpenAI’s Codex. The write-up is here: claude-code-harness-deep-dive.md.
A CLI built around an event stream (and a surprising UI choice)
One of the more interesting implementation details is Claude Code’s async-generator request lifecycle. Rather than splitting “text streaming,” “tool calls,” and “progress updates” into separate pathways, the system runs everything through a single yield-based stream that different frontends can render their own way.
On the interface side, the gist notes that Claude Code’s terminal UI is effectively a React app rendered with Ink, with .tsx at the CLI entrypoint and componentized rendering for message bubbles, tool output, prompts, and markdown.
Context management is where the harness shows up
The deep dive spends a good chunk on what happens when sessions get long and context limits become a real product constraint. Claude Code reportedly uses four compaction strategies, including proactive and reactive summarization, an SDK-only truncation mode, and a feature-flagged “context collapse” mechanism that can persist reversible collapse commits in the transcript. That’s set against Codex’s more compact, Rust-centric approach with fewer layers and more protocol-level minimization.
Also notable: Claude Code’s system prompt is split at a dynamic boundary marker so the “static” portion can be cached globally, while per-session material (like CLAUDE.md and environment details) is appended after the boundary.
Internal builds, feature flags, and operational scars
Other sections highlight how much of this system is shaped by production realities: internal vs external prompt variants guarded by build-time checks, Bun-powered dead code elimination via feature flags, and intentionally asymmetric session persistence (user messages awaited, assistant messages fire-and-forget) to keep resume behavior reliable.
There are also smaller but telling details—like a “thinking rules” comment that reads like a warning label—and a permission pipeline that feeds denial results back into the model as tool output, so the agent can adapt rather than stall.
For the full breakdown (including the Codex comparisons and the “harness over model” conclusion), the original is worth reading: https://gist.github.com/Haseeb-Qureshi/d0dc36844c19d26303ce09b42e7188c1.
