Cursor expands long-running coding agents to Ultra and Teams

Cursor has just rolled out a research preview of long-running coding agents for Ultra, Teams, and Enterprise users. A new harness adds upfront planning and multi-agent checks to curb drift on multi-hour tasks. Early tests show bigger PRs with similar merge rates.

cursor cover

TL;DR

  • Long-running agents (research preview): Autonomous multi-hour/multi-day coding runs; custom harness targets long-horizon drift and completion
  • Availability: At cursor.com/agents for Ultra, Teams, and Enterprise users
  • Plan-first workflow: Agents propose a plan and wait for approval before execution
  • Cross-checking: Multiple agents review each other’s work to prevent context loss, early stopping, and partial solutions
  • Reported results: Larger PRs with merge rates comparable to other agents; examples include 36h chat platform, 30h mobile app, 25h RBAC refactor
  • Next focus: Parallelism, multi-agent collaboration, and tooling for managing and safely deploying increased generated code volume

Cursor is widening access to its long-running agents, a research preview feature designed for autonomous, multi-hour (or multi-day) coding tasks. The work builds on Cursor’s recent experiments with agents tackling bigger projects—most notably an internal effort that explored agents operating in a web browser and the failure modes that show up once tasks stretch across long horizons.

That core problem is familiar to anyone watching agents in the wild: even strong models can drift after a few decisions compound, turning a small misunderstanding early on into a fully wrong implementation later. Cursor’s answer is a custom “harness” intended to keep these longer efforts on track and push them toward completion.

What Cursor is shipping, and who gets it

The research preview is now available at cursor.com/agents for Ultra, Teams, and Enterprise users. Cursor frames this as part of broader work toward “self-driving codebases”, where agents can take on larger chunks of engineering work with less constant supervision.

Why long-running agents behave differently

Cursor points to two principles that improved results for long-horizon work:

Planning before execution

Instead of immediately charging ahead, long-running agents propose a plan and wait for approval. Cursor’s reasoning is straightforward: when an agent runs unattended, upfront alignment matters more than fast iteration.

Following through on tasks

Cursor also observed that frontier models can produce good code but still lose the plot—forgetting context, stopping early, or shipping only partial solutions. Its harness leans on a plan plus “multiple different agents checking each other’s work” to stay oriented and complete larger tasks.

Early results: bigger PRs, similar merge rates

In Cursor’s research preview, the company reports that long-running agents produced substantially larger PRs with merge rates comparable to other agents. Example runs cited in the preview include:

  • Building a new chat platform integrated with an existing open-source tool (36 hours)
  • Implementing a mobile app based on an existing web app (30 hours)
  • Refactoring an authentication and RBAC system (25 hours)

Cursor also says agents commonly ran for more than a day and that resulting PRs often merged with minimal follow-up work.

What Cursor says it’s using internally

Cursor outlines several internal tasks it claims have since been merged:

  • Video renderer optimization, including a migration to Rust and custom kernels while matching existing output from original logic
  • Policy-driven network access for sandboxed code, including JSON-driven policy controls and a local HTTP proxy; Cursor describes a ~10k-line PR with few issues in a large test suite
  • Sudo support in Cursor CLI, implementing secure password prompting and reasoning through Unix auth flows

What’s next: parallelism and safe deployment at scale

Cursor positions long-running agents as an early milestone, with active work aimed at better collaboration across multiple long-running agents, parallel work streams, and tooling for managing and safely deploying the increasing volume of generated code.

Source: https://cursor.com/blog/long-running-agents?

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community