Passive AGENTS.md Docs Beat Skills in Vercel's Next.js 16 Tests

Vercel’s evaluation found that embedding a compressed docs index directly in AGENTS.md outperformed on-demand skills in agent-focused coding tests. The comparison targeted Next.js 16 APIs that are absent from current model training data, and the full write-up is available on the Vercel blog: https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals.

The problem at hand

AI coding agents frequently rely on pre-training knowledge that can be outdated for fast-moving frameworks. Next.js 16 introduces APIs such as 'use cache', connection(), and forbidden() that may not be present in model training sets. When agents lack version-matched docs, generated code can be incorrect or drift to older idioms. The evaluation focused on feeding agents version-accurate documentation so they can perform correct edits and implementations.

Two approaches compared

Skills: an open standard for packaging prompts, tools, and docs into an on-demand retrieval mechanism (agentskills.io). The expectation is that agents detect when framework knowledge is needed, invoke a skill, and read targeted documentation.
AGENTS.md: a project-root markdown file that supplies persistent context to agents. Content placed in AGENTS.md is available every turn (Claude Code uses CLAUDE.md similarly). The evaluation used a Next.js docs skill and a compressed docs index injected into AGENTS.md pointing to a .next-docs/ directory.

What the evals measured

The hardened eval suite removed leakage and focused on behavior. Tests targeted Next.js 16-specific features that models are unlikely to already know, including:

connection() for dynamic rendering
'use cache' directive
cacheLife(), cacheTag()
forbidden(), unauthorized()
async cookies() and headers()
after(), updateTag(), refresh()
proxy.ts API proxying

Each configuration was judged across Build, Lint, and Test assertions with retries to reduce model variance.

Outcomes

Final pass rates across configurations:

Baseline (no docs): 53%
Skill (default): 53%
Skill with explicit instructions: 79%
AGENTS.md docs index: 100%

On the Build/Lint/Test breakdown, AGENTS.md hit 100% across all three, while skills—unless guided by fragile, explicit prompts—either weren’t invoked reliably (skill triggered in only 44% of cases by default) or produced poorer results when wording forced premature doc-first behavior.

Why passive context performed better

The working explanation centers on three factors:

No decision point. With AGENTS.md there’s no need for the agent to decide whether to consult docs—the context is present every turn.
Consistent availability. Skills require invocation and can load asynchronously; passive context is part of the system prompt.
No sequencing fragility. Skills create an ordering problem (explore project first vs. read docs first) that subtle wording changes can flip. Persistent context removes that brittleness.

Managing context size

Concern about context bloat was addressed by compressing the docs index. A full injection (~40KB) was reduced to 8KB (≈80% smaller) using a pipe-delimited, minified index that maps doc sections to files. The index does not put all doc content into the prompt; it points agents to retrievable files under .next-docs/ so agents can fetch specific files when needed.

Example setup command included in the evaluation:

npx @next/codemod@canary agents-md

This codemod:

Detects the Next.js version
Downloads matching docs to .next-docs/
Injects the compressed index into AGENTS.md

The codemod and related PR are available in the Next.js repo: https://github.com/vercel/next.js/pull/88961.

Practical implications for framework authors

For broad, general framework knowledge, passive context via AGENTS.md currently provides more reliable results than skills.
Skills remain useful for focused, action-specific workflows that users intentionally trigger (for example, automated upgrade or migration tasks).
Compressing an index and structuring docs for retrieval can achieve version-accurate guidance without exhausting the context window.
Evals should target APIs outside model pre-training to reveal the real benefits of versioned documentation.

Further details and the full evaluation are on the original Vercel post: https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals