Claude Code cache change sparks reports of fast quota burn

Anthropic’s Claude Code prompt-cache TTL reportedly dropped from one hour to five minutes, and developers say their quotas are draining far faster in long, high-context sessions. The Register has more on the pricing mechanics, possible bugs, and what Anthropic may change next.

Claude Code cache change sparks reports of fast quota burn

TL;DR

  • Prompt-cache TTL changed: ~1 hour (around Feb 1) rolled back to 5 minutes (around Mar 7), impacting long sessions
  • Token economics: 5-minute cache write +25%; 1-hour cache write +100%; cache reads ~10% base price
  • Anthropic position: Claude Code auto-selects TTL; no global TTL setting planned; 5-minute TTL suits one-shot requests
  • Quota impacts reported: $200/month users saw quota limits start in March; faster “burn rate” affecting usability
  • Context window multiplier: paid plans support 1M-token context; cache misses can be costly after breaks
  • Possible mitigations under review: investigating 400,000-token default context, keeping 1M tokens configurable

Anthropic’s Claude Code cache confusion is turning into a practical problem for developers who rely on long-running, high-context coding sessions—especially as more reports point to quotas draining faster than expected after a behind-the-scenes change to prompt-cache TTL.

A five-minute cache meets long-session workflows

The core of the complaint: Anthropic previously introduced a one-hour prompt cache for Claude Code context around February 1, then rolled it back to a five-minute cache around March 7. Developer Sean Swanson documented the shift in a GitHub bug report, arguing that the shorter TTL “is disproportionately punishing” for the kind of extended, iterative sessions Claude Code tends to enable.

In Claude’s pricing mechanics, caching is a major lever because context is expensive: it’s extra data shipped with prompts—code, instructions, background—so the model can stay accurate over time. Claude prompt caching reduces repeated work by reusing previously processed prompts. But the economics can be unintuitive:

  • Writing to the five-minute cache costs 25% more in tokens
  • Writing to the one-hour cache costs 100% more
  • Reading from cache is roughly 10% of the base price

Anthropic staffer Jarred Sumner responded that the five-minute TTL can be cheaper overall because many Claude Code requests are effectively one-shot calls—cache once, never revisit—so paying the higher one-hour write cost doesn’t help. Sumner also noted Claude Code picks TTL automatically, with no global setting planned.

Context windows, cache misses, and quota shock

Swanson later acknowledged that fast-moving subagent-style sessions can benefit from the cheaper write path when caches “almost never expire.” Still, the broader issue remains: Swanson said that after months on a $200/month plan, quota limits only started hitting in March—and the new “burn rate” is changing how usable the service feels.

Claude Code creator Boris Cherny also flagged another multiplier: the 1M-token context window on paid plans. Cache misses at that size can be punishing; Cherny noted that stepping away for over an hour and returning to a stale session can trigger a full miss. Anthropic is reportedly investigating a 400,000-token default context window, while keeping 1M tokens as an option via configuration.

Meanwhile, some Pro users ($20/month) report quota exhaustion to the point of getting as few as two prompts in five hours, and others suggest caching bugs may be skewing the numbers enough that TTL debates don’t fully explain what’s happening.

Original source

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community