Addy Osmani overview on model harness

Addy Osmani argues coding agents aren’t just models—they’re systems built from prompts, tools, sandboxes, and guardrails. His “agent harness engineering” thesis: turn failures into rules, because a solid harness can beat a better model.

Addy Osmani overview on model harness

TL;DR

  • Agent = model + harness: Agents function as systems shaped by surrounding infrastructure, not standalone models
  • Harness components: Prompts, tool definitions, context rules, hooks, sandboxes, subagents, feedback loops, recovery paths
  • Operational layers: AGENTS.md memory, MCP servers, filesystem/sandbox access, subagent handoffs, deterministic checks, observability
  • Key claim: A decent model with a great harness beats a great model with a bad harness
  • Failures become rules: Conventions added to AGENTS.md, destructive commands blocked, planning separated from execution
  • Counterpoint noted: Some failures are judgment calls (refactor vs ship), less amenable to deterministic harness patches

Addy Osmani’s post on X argues that coding agents should be understood less as standalone models and more as systems built from the surrounding infrastructure. In the thread, Osmani describes “agent harness engineering” as the work of tightening that scaffolding every time an agent makes a mistake, so the same failure does not recur.

Osmani writes that “agent = model + harness,” and that “if you’re not the model, you’re the harness.” In that view, the harness includes prompts, tool definitions, context rules, hooks, sandboxes, subagents, feedback loops and recovery paths. Osmani also cites other voices in the field, including HumanLayer, Anthropic’s engineering team and Birgitta Böckeler, as converging on similar conclusions about how agents actually operate.

What the harness includes

The post breaks the harness into several parts: memory files such as AGENTS.md, tools and MCP servers, filesystem and sandbox access, orchestration for subagent handoffs, hooks for deterministic checks and observability for logs, traces, cost and latency. Osmani’s point is that a raw model does not become an agent until some layer gives it state, tool execution and constraints.

That leads to a simple claim repeated throughout the thread: “A decent model with a great harness consistently beats a great model with a bad harness.” Osmani suggests that model selection matters, but that behavior is often dominated by the wrapper around the model rather than the model alone.

Failures become rules

A central theme in the post is that agent mistakes should become permanent configuration changes. If an agent ignores a convention, Osmani suggests adding it to AGENTS.md. If it runs a destructive command, the harness should block it. If it loses track of a long task, the system should split planning from execution.

Osmani also quotes HumanLayer’s line that “It’s not a model problem. It’s a configuration problem.” The thread presents that as a counter to the habit of blaming the model whenever an agent behaves badly. Instead, the suggestion is to treat failures as a signal that some rule, hook or tool boundary needs to be tightened.

A counterpoint in the replies

The thread also drew at least one skeptical reply. A commenter argued that not every agent failure looks like a deterministic bug that can be patched away, and that some decisions are closer to judgment calls — for example, whether to refactor now or ship. That response points to a limitation in the harness-first view: some outputs may reflect discretion as much as enforcement.

Osmani closes by pointing to work from Fareed Khan on Claude Code’s architecture and to Flue, a framework from @FredKSchott that Osmani described as “solid” and apparently inspired by an earlier version of the post. The thread’s broader claim is that the most consequential engineering work around coding agents may be moving away from model comparison and toward the design of the runtime around them.

Source: Addy Osmani on X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community