Coinbase says smarter AI routing cut spend nearly in half

Coinbase CEO Brian Armstrong says the company kept AI spending nearly flat as tokens rose. He credits better defaults, model routing, and cache-aware requests—not tighter caps. The approach leans more on open-weight models and leaner context.

June 27, 2026

•

LLM Open Source GLM

TL;DR

Coinbase reported AI spending “nearly flat” despite rising token usage; focus on defaults, routing, caching over stricter caps
Model defaults: Some workflows routed to open weight models like GLM 5.2 and Kimi 2.7 via an LLM gateway
Prompt/task routing: Custom harnesses preprocess prompts and route tasks across different models
Cache-aware requests: Reuse warm caches more often to reduce repeated work and cost
Context discipline: Fresh sessions when switching tasks; narrow file context; disconnect unused tools
Usage policy: 91% not hitting caps; usage visibility emphasized, expecting higher spend to correlate with impact

Brian Armstrong claimed on X that Coinbase has been able to keep AI spending "nearly flat" even as token usage rises, arguing that the answer lies in "better defaults, routing, and caching" rather than stricter usage caps or more alerts.

In the post, the Coinbase chief executive outlined a few internal changes he said are already in motion. Those include defaulting some workflows toward open weight models such as "GLM 5.2" and "Kimi 2.7" through the company’s LLM gateway, using custom harnesses to preprocess prompts and route tasks to different models, and making requests cache-aware so the system can reuse warm caches more often. Armstrong also said the company is encouraging engineers to keep context lean by starting fresh sessions when switching tasks, scoping file context narrowly, and disconnecting unused tools.

A chart attached to the post, titled "AI Spend at Coinbase (Bars) -vs- Token Usage (Line)," appears to show multicolored stacked bars for AI spend and a black line for total company tokens. The bars and line both climb sharply toward the right side of the chart, matching Armstrong’s claim that token usage continues to rise even as spend falls. He said the effort has cut AI spend "nearly in half."

Armstrong also stated that 91% of employees were not hitting usage caps, which he used to argue against lowering limits. The post instead presents visibility as a control mechanism: workers can use as many tokens as they want, but their usage is visible, and higher spend is expected to correlate with more impact.

The thread drew a mix of applause and skepticism from other X users, including questions about cache coherence, whether the company self-hosts its open weight models, and how product quality or productivity changes when cheaper models are used more often. Some replies also pushed the discussion toward open-source models and post-training, though Coinbase has not publicly expanded on those points in the thread.

Source: Brian Armstrong on X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community

Zai.org launches GLM-5.2 with open weights and upgrades

Zai.org has just rolled out GLM-5.2, billed as “Frontier Intelligence, Open Weights.” The announcement points to major gains in coding and agentic tasks, with another promised strength only partially visible. A repost by Jiayuan Zhang quickly drew over 1,100 shares.

Jun 17, 2026

2 shared tags

Armin Ronacher warns Pi-on-Pi dogfooding is getting messy

In a newly published post, Armin Ronacher digs into what happens when Pi is used to build Pi—and why LLM-shaped issue reports can add confident “slop.” He also breaks down the scale problem in trackers and argues for stronger shared foundations over patchwork fixes.

Jun 3, 2026

2 shared tags

GLM-5.2 benchmark hype questioned after real-world code audit

In a post on X, Paweł Huryn says GLM-5.2 looks “clearly worse” than GPT-5.5 and Opus 4.8 in his blind-graded bug-hunt. Across 60 audits, GPT-5.5 and Opus improved with max reasoning, while GLM-5.2 didn’t.

Jun 21, 2026

1 shared tag

Continue the conversation on Slack

Related Articles

Zai.org launches GLM-5.2 with open weights and upgrades

Armin Ronacher warns Pi-on-Pi dogfooding is getting messy

GLM-5.2 benchmark hype questioned after real-world code audit