Cloudflare’s new MCP server shrinks AI tool context to 1,000 tokens

Cloudflare has just rolled out a new MCP server for its entire API, covering 2,500+ endpoints with just two tools: search() and execute(). The server-side “Code Mode” approach keeps context usage fixed and runs model-written code in a locked-down V8 sandbox.

Cloudflare’s new MCP server shrinks AI tool context to 1,000 tokens

TL;DR

  • Cloudflare MCP server: Code Mode minimizes MCP tool descriptions to reduce context-window token costs
  • Two-tool interface: search() queries typed OpenAPI (pre-resolved $refs); execute() runs authenticated API calls
  • Scale and footprint: ~1,000 tokens for 2,500+ endpoints across DNS, Zero Trust, Workers, and R2
  • Token reduction claim: 99.9% fewer input tokens versus per-endpoint tools (~1.17M tokens estimated)
  • Security and auth: Dynamic Worker V8 sandbox; no FS/env vars; external fetch disabled by default; OAuth 2.1 via Workers OAuth Provider
  • Availability and OSS: Available now at [https://mcp.cloudflare.com/mcp;](https://mcp.cloudflare.com/mcp`;) repo at https://github.com/cloudflare/mcp; Code Mode SDK open-sourced in https://github.com/cloudflare/agents/tree/main/packages/codemode

Cloudflare’s new MCP server for the entire Cloudflare API is built around an idea that’s quietly becoming a defining constraint for AI-assisted coding: tool access is useful, but tool descriptions are expensive. In MCP, every additional tool definition eats into a model’s context window, which can leave less room for the actual task. Cloudflare’s answer is Code Mode, a pattern that keeps the MCP “surface area” tiny while still providing coverage across a very large API.

Instead of shipping thousands of individual tools for each API operation, the server exposes just two: search() and execute(). The result, Cloudflare says, is a fixed MCP footprint of around 1,000 tokens even though the Cloudflare API spans more than 2,500 endpoints across products like DNS, Zero Trust, Workers, and R2.

Code Mode, moved server-side

Cloudflare previously introduced Code Mode as a way to reduce context usage by having models write code against a typed SDK, then running it safely via a Dynamic Worker Loader. This new MCP server applies that approach server-side, so MCP clients don’t need special sandboxing capabilities.

The two exported tools map to two phases of agent work:

  • search(): runs model-written JavaScript against a typed representation of the OpenAPI spec, with $refs pre-resolved inline. Notably, the OpenAPI spec doesn’t get pasted into the model context; the agent queries it through code and receives narrowed results.
  • execute(): runs model-written JavaScript that can make authenticated calls to the Cloudflare API, including pagination and chained operations.

Cloudflare quantifies the difference sharply: for a large API, Code Mode reduces input token usage by 99.9% compared with a “native” MCP server that enumerates tools per endpoint, which they estimate at 1.17 million tokens.

A concrete example: narrowing endpoints, then changing configuration

To show how discovery and execution fit together, Cloudflare walks through an agent request to protect an origin from DDoS attacks. The flow is deliberately mechanical:

  1. Use search() to programmatically scan the spec for relevant endpoints (in this case, zone paths tied to WAF and rulesets).
  2. Optionally inspect schemas (for example, pulling the enum of available ruleset phases to identify ddos_l7 and http_request_firewall_managed).
  3. Switch to execute() to call the API (listing rulesets, then fetching entrypoint rulesets for DDoS L7 and WAF managed).

Cloudflare emphasizes that end-to-end—searching, inspecting schema details, listing existing rulesets, and fetching configurations—can be done in four tool calls, with the complexity residing inside the generated code rather than exploding the tool catalog.

Sandbox and auth: fewer moving parts in context, tighter execution boundaries

Both tools run inside a Dynamic Worker isolate, described as a lightweight V8 sandbox with no file system, no environment variables, and external fetches disabled by default (with outbound requests controllable when needed). In practice, this is a design aimed at making “agent writes code” less scary when that code is executed by a server.

On the authentication side, the MCP server is described as OAuth 2.1 compliant, using Workers OAuth Provider to downscope tokens to permissions granted during connection. Cloudflare also notes support for both user and account tokens passed as bearer tokens.

How Cloudflare frames it versus other context-reduction tactics

Cloudflare positions server-side Code Mode alongside a few other ways teams are coping with MCP tool sprawl:

  • Client-side Code Mode, where the agent runs code locally (but then the agent environment needs secure sandbox access).
  • CLIs, which can self-document capabilities but require shell access and expand the attack surface.
  • Dynamic tool search, which reduces the number of tools shown to the model, but each tool still costs tokens and the search system needs maintenance.

Their argument is that server-side Code Mode hits a practical balance for MCP: fixed token cost regardless of API size, progressive capability discovery, and no agent-side modifications.

Availability and setup

Cloudflare says the MCP server is available now at [https://mcp.cloudflare.com/mcp,](https://mcp.cloudflare.com/mcp`,) with setup details and configuration options in the Cloudflare MCP repository. The company also open-sourced a Code Mode SDK inside the Cloudflare Agents SDK, positioning it as a reusable pattern for building similar MCP servers.

Cloudflare’s original post: https://blog.cloudflare.com/code-mode-mcp/?amp;amp;amp;amp;amp;amp;

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community