Addy Osmani’s latest post on long-running agents looks at where AI agents appear to be heading once a single chat window is no longer enough. The piece argues that the next phase is less about flashier tools and more about systems that can keep moving on a task across multiple sessions, sandboxes, and time spans without losing track of what happened before.
The article separates that idea into a few different categories, including long-horizon reasoning, long-running execution, and persistent agency. It also walks through why today’s agents still hit familiar limits: context windows run out, state disappears between sessions, and models tend to sound more certain about completion than they probably should.
From there, the post surveys how several major teams are approaching the problem. Anthropic’s harnesses, Cursor’s planner-worker-judge setup, and Google’s Agent Platform each take a slightly different path, but they seem to converge on the same broad structure: keep the model loop separate from the execution sandbox, add durable session history, and make verification part of the workflow rather than an afterthought.
The more practical part of the piece focuses on patterns that make these systems workable in production, such as checkpointing, human approval gates, layered memory, ambient processing, and coordinating multiple agents. Osmani also flags the trade-offs that still look unresolved, including cost, security, drift, and the difficulty of writing specs that survive a long autonomous run.
For readers following the shift from short-lived bots to agents that can keep working overnight or longer, the full post is worth a look. It pulls together the current thinking from practitioners and platform teams into one fairly compact overview.
Source: Addy Osmani