Cursor is widening access to its long-running agents, a research preview feature designed for autonomous, multi-hour (or multi-day) coding tasks. The work builds on Cursor’s recent experiments with agents tackling bigger projects—most notably an internal effort that explored agents operating in a web browser and the failure modes that show up once tasks stretch across long horizons.
That core problem is familiar to anyone watching agents in the wild: even strong models can drift after a few decisions compound, turning a small misunderstanding early on into a fully wrong implementation later. Cursor’s answer is a custom “harness” intended to keep these longer efforts on track and push them toward completion.
What Cursor is shipping, and who gets it
The research preview is now available at cursor.com/agents for Ultra, Teams, and Enterprise users. Cursor frames this as part of broader work toward “self-driving codebases”, where agents can take on larger chunks of engineering work with less constant supervision.
Why long-running agents behave differently
Cursor points to two principles that improved results for long-horizon work:
Planning before execution
Instead of immediately charging ahead, long-running agents propose a plan and wait for approval. Cursor’s reasoning is straightforward: when an agent runs unattended, upfront alignment matters more than fast iteration.
Following through on tasks
Cursor also observed that frontier models can produce good code but still lose the plot—forgetting context, stopping early, or shipping only partial solutions. Its harness leans on a plan plus “multiple different agents checking each other’s work” to stay oriented and complete larger tasks.
Early results: bigger PRs, similar merge rates
In Cursor’s research preview, the company reports that long-running agents produced substantially larger PRs with merge rates comparable to other agents. Example runs cited in the preview include:
- Building a new chat platform integrated with an existing open-source tool (36 hours)
- Implementing a mobile app based on an existing web app (30 hours)
- Refactoring an authentication and RBAC system (25 hours)
Cursor also says agents commonly ran for more than a day and that resulting PRs often merged with minimal follow-up work.
What Cursor says it’s using internally
Cursor outlines several internal tasks it claims have since been merged:
- Video renderer optimization, including a migration to Rust and custom kernels while matching existing output from original logic
- Policy-driven network access for sandboxed code, including JSON-driven policy controls and a local HTTP proxy; Cursor describes a ~10k-line PR with few issues in a large test suite
- Sudo support in Cursor CLI, implementing secure password prompting and reasoning through Unix auth flows
What’s next: parallelism and safe deployment at scale
Cursor positions long-running agents as an early milestone, with active work aimed at better collaboration across multiple long-running agents, parallel work streams, and tooling for managing and safely deploying the increasing volume of generated code.
