Passive AGENTS.md Docs Beat Skills in Vercel's Next.js 16 Tests

Vercel's evaluation shows embedding a compressed docs index in AGENTS.md let agents achieve 100% pass rates on Next.js 16-specific coding tests, outperforming on-demand skills. Passive context removed invocation and sequencing fragility while an 8KB index kept prompt size manageable.

Passive AGENTS.md Docs Beat Skills in Vercel's Next.js 16 Tests

TL;DR

  • Problem: agents’ pretraining can be outdated for fast-moving frameworks; Next.js 16 APIs (e.g., connection(), 'use cache', forbidden(), async cookies(), proxy.ts) often absent from model training data.
  • Methods compared: Skills (on-demand retrieval via agentskills.io) vs. project-root AGENTS.md with a compressed docs index pointing to .next-docs/.
  • Eval focus: behavior-only tests targeting Next.js 16-specific APIs, judged across Build, Lint, and Test with retries.
  • Result: AGENTS.md docs index achieved 100% final pass (100% Build/Lint/Test); skills ranged from 53% to 79% depending on prompting.
  • Why passive context won: no decision point to invoke docs, consistent availability each turn, and no sequencing fragility from explicit skill invocation.
  • Docs handling and tooling: compressed index reduced ~40KB to 8KB (~80% smaller) and points to retrievable files.

Vercel’s evaluation found that embedding a compressed docs index directly in AGENTS.md outperformed on-demand skills in agent-focused coding tests. The comparison targeted Next.js 16 APIs that are absent from current model training data, and the full write-up is available on the Vercel blog: https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals.

The problem at hand

AI coding agents frequently rely on pre-training knowledge that can be outdated for fast-moving frameworks. Next.js 16 introduces APIs such as 'use cache', connection(), and forbidden() that may not be present in model training sets. When agents lack version-matched docs, generated code can be incorrect or drift to older idioms. The evaluation focused on feeding agents version-accurate documentation so they can perform correct edits and implementations.

Two approaches compared

  • Skills: an open standard for packaging prompts, tools, and docs into an on-demand retrieval mechanism (agentskills.io). The expectation is that agents detect when framework knowledge is needed, invoke a skill, and read targeted documentation.
  • AGENTS.md: a project-root markdown file that supplies persistent context to agents. Content placed in AGENTS.md is available every turn (Claude Code uses CLAUDE.md similarly). The evaluation used a Next.js docs skill and a compressed docs index injected into AGENTS.md pointing to a .next-docs/ directory.

What the evals measured

The hardened eval suite removed leakage and focused on behavior. Tests targeted Next.js 16-specific features that models are unlikely to already know, including:

  • connection() for dynamic rendering
  • 'use cache' directive
  • cacheLife(), cacheTag()
  • forbidden(), unauthorized()
  • async cookies() and headers()
  • after(), updateTag(), refresh()
  • proxy.ts API proxying

Each configuration was judged across Build, Lint, and Test assertions with retries to reduce model variance.

Outcomes

Final pass rates across configurations:

  • Baseline (no docs): 53%
  • Skill (default): 53%
  • Skill with explicit instructions: 79%
  • AGENTS.md docs index: 100%

On the Build/Lint/Test breakdown, AGENTS.md hit 100% across all three, while skills—unless guided by fragile, explicit prompts—either weren’t invoked reliably (skill triggered in only 44% of cases by default) or produced poorer results when wording forced premature doc-first behavior.

Why passive context performed better

The working explanation centers on three factors:

  • No decision point. With AGENTS.md there’s no need for the agent to decide whether to consult docs—the context is present every turn.
  • Consistent availability. Skills require invocation and can load asynchronously; passive context is part of the system prompt.
  • No sequencing fragility. Skills create an ordering problem (explore project first vs. read docs first) that subtle wording changes can flip. Persistent context removes that brittleness.

Managing context size

Concern about context bloat was addressed by compressing the docs index. A full injection (~40KB) was reduced to 8KB (≈80% smaller) using a pipe-delimited, minified index that maps doc sections to files. The index does not put all doc content into the prompt; it points agents to retrievable files under .next-docs/ so agents can fetch specific files when needed.

Example setup command included in the evaluation:

  • npx @next/codemod@canary agents-md

This codemod:

  1. Detects the Next.js version
  2. Downloads matching docs to .next-docs/
  3. Injects the compressed index into AGENTS.md

The codemod and related PR are available in the Next.js repo: https://github.com/vercel/next.js/pull/88961.

Practical implications for framework authors

  • For broad, general framework knowledge, passive context via AGENTS.md currently provides more reliable results than skills.
  • Skills remain useful for focused, action-specific workflows that users intentionally trigger (for example, automated upgrade or migration tasks).
  • Compressing an index and structuring docs for retrieval can achieve version-accurate guidance without exhausting the context window.
  • Evals should target APIs outside model pre-training to reveal the real benefits of versioned documentation.

Further details and the full evaluation are on the original Vercel post: https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community