Brian Armstrong claimed on X that Coinbase has been able to keep AI spending "nearly flat" even as token usage rises, arguing that the answer lies in "better defaults, routing, and caching" rather than stricter usage caps or more alerts.
In the post, the Coinbase chief executive outlined a few internal changes he said are already in motion. Those include defaulting some workflows toward open weight models such as "GLM 5.2" and "Kimi 2.7" through the company’s LLM gateway, using custom harnesses to preprocess prompts and route tasks to different models, and making requests cache-aware so the system can reuse warm caches more often. Armstrong also said the company is encouraging engineers to keep context lean by starting fresh sessions when switching tasks, scoping file context narrowly, and disconnecting unused tools.
A chart attached to the post, titled "AI Spend at Coinbase (Bars) -vs- Token Usage (Line)," appears to show multicolored stacked bars for AI spend and a black line for total company tokens. The bars and line both climb sharply toward the right side of the chart, matching Armstrong’s claim that token usage continues to rise even as spend falls. He said the effort has cut AI spend "nearly in half."
Armstrong also stated that 91% of employees were not hitting usage caps, which he used to argue against lowering limits. The post instead presents visibility as a control mechanism: workers can use as many tokens as they want, but their usage is visible, and higher spend is expected to correlate with more impact.
The thread drew a mix of applause and skepticism from other X users, including questions about cache coherence, whether the company self-hosts its open weight models, and how product quality or productivity changes when cheaper models are used more often. Some replies also pushed the discussion toward open-source models and post-training, though Coinbase has not publicly expanded on those points in the thread.
Source: Brian Armstrong on X
