Coinbase says smarter AI routing cut spend nearly in half

Coinbase CEO Brian Armstrong says the company kept AI spending nearly flat as tokens rose. He credits better defaults, model routing, and cache-aware requests—not tighter caps. The approach leans more on open-weight models and leaner context.

llm cover

TL;DR

  • Coinbase reported AI spending “nearly flat” despite rising token usage; focus on defaults, routing, caching over stricter caps
  • Model defaults: Some workflows routed to open weight models like GLM 5.2 and Kimi 2.7 via an LLM gateway
  • Prompt/task routing: Custom harnesses preprocess prompts and route tasks across different models
  • Cache-aware requests: Reuse warm caches more often to reduce repeated work and cost
  • Context discipline: Fresh sessions when switching tasks; narrow file context; disconnect unused tools
  • Usage policy: 91% not hitting caps; usage visibility emphasized, expecting higher spend to correlate with impact

Brian Armstrong claimed on X that Coinbase has been able to keep AI spending "nearly flat" even as token usage rises, arguing that the answer lies in "better defaults, routing, and caching" rather than stricter usage caps or more alerts.

In the post, the Coinbase chief executive outlined a few internal changes he said are already in motion. Those include defaulting some workflows toward open weight models such as "GLM 5.2" and "Kimi 2.7" through the company’s LLM gateway, using custom harnesses to preprocess prompts and route tasks to different models, and making requests cache-aware so the system can reuse warm caches more often. Armstrong also said the company is encouraging engineers to keep context lean by starting fresh sessions when switching tasks, scoping file context narrowly, and disconnecting unused tools.

A chart attached to the post, titled "AI Spend at Coinbase (Bars) -vs- Token Usage (Line)," appears to show multicolored stacked bars for AI spend and a black line for total company tokens. The bars and line both climb sharply toward the right side of the chart, matching Armstrong’s claim that token usage continues to rise even as spend falls. He said the effort has cut AI spend "nearly in half."

Armstrong also stated that 91% of employees were not hitting usage caps, which he used to argue against lowering limits. The post instead presents visibility as a control mechanism: workers can use as many tokens as they want, but their usage is visible, and higher spend is expected to correlate with more impact.

The thread drew a mix of applause and skepticism from other X users, including questions about cache coherence, whether the company self-hosts its open weight models, and how product quality or productivity changes when cheaper models are used more often. Some replies also pushed the discussion toward open-source models and post-training, though Coinbase has not publicly expanded on those points in the thread.

Source: Brian Armstrong on X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community