How Agents Manage Context: Patterns for Long-Running AI

A new piece by R. Lance Martin surveys context-management patterns for long-running agents, framing context as the scarce resource. He highlights a virtual agent filesystem and prompt caching as practical tactics to sustain extended runtimes.

January 12, 2026

TL;DR

Context scarcity: long tasks exhaust attention budgets and raise token cost, forcing architectural trade-offs for sustained agent execution.
Give agents a virtual computer (filesystem + shell): persistent storage, process spawning, and programmatic access shift bulky context off the in-memory prompt.
Multi-layer action space and progressive disclosure: expose a small set of atomic tools (e.g., shell) and fetch full tool metadata only on demand to reduce token bloat.
Offload and summarize to filesystem: store historical results and plans on disk, summarize or reintroduce selectively to avoid context rot.
Prompt caching and context isolation: cache common prompt prefixes to lower cost/latency; use sub-agents with isolated contexts for parallel map-reduce workflows (Ralph Loop).
Evolve context via reflection and future directions: distill trajectories into persistent memories/skills; anticipated paths include models learning context-management routines, richer multi-agent coordination with shared workspaces and merge queues, and new observability/HITL abstractions — see full survey: https://rlancemartin.github.io/2026/01/09/agent_design/?

A short survey of practical patterns for long-running autonomous agents appears in this write-up on agent design and context management. It arrives after a banner year—Meta’s acquisition of Manus for over $2B and Claude Code hitting a $1B run rate— and frames those commercial signals against the engineering problem that matters most: managing limited context effectively as agents tackle longer tasks.

Why context management matters

Large context windows are not a silver bullet. As task length grows, LLM performance degrades: models have a finite “attention budget,” and every token has cost and opportunity cost. The surveyed patterns treat context as a scarce resource and show how architectural choices—storage, action abstraction, and runtime caching—shape whether an agent can run for hours or years.

Concise tour of core patterns

The article groups recurring engineering choices into a compact set of patterns. Highlights:

Give agents a virtual computer (filesystem + shell). Persistent storage and programmatic access let agents spawn processes, persist artifacts, and shift bulky context off the in-memory prompt.
Multi-layer action space. Instead of loading dozens of tool specs into context, agents expose a small set of atomic tools (e.g., a shell tool) that execute richer operations on the computer, saving tokens and reducing in-prompt complexity.
Progressive disclosure. Only essential tool metadata is presented in the prompt; full descriptions or help text are retrieved on demand to avoid token bloat.
Offload and summarize to filesystem. Historical results and plans live on disk and are summarized or reintroduced selectively, which helps long-running loops avoid context rot.
Prompt caching and context isolation. Caching common prompt prefixes lowers cost and latency; sub-agents with isolated contexts support parallel work and map-reduce workflows (the so-called Ralph Loop).
Evolve context through reflection. Trajectories and session diaries can be distilled into persistent memories, skills, or prompt variants for continual improvement.

Where development appears headed

The piece sketches three forward paths: models that learn their own context-management routines (recursive LLM-style approaches), richer multi-agent coordination with shared workspaces and merge queues, and new abstractions for observability and human-in-the-loop control for long-running deployments.

For an accessible, example-rich tour that links to the original engineering posts and papers behind each pattern, see the full survey: https://rlancemartin.github.io/2026/01/09/agent_design/?

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community