Claude Code cache change sparks reports of fast quota burn

Anthropic’s Claude Code prompt-cache TTL reportedly dropped from one hour to five minutes, and developers say their quotas are draining far faster in long, high-context sessions. The Register has more on the pricing mechanics, possible bugs, and what Anthropic may change next.

April 15, 2026

•

Claude

TL;DR

Prompt-cache TTL changed: ~1 hour (around Feb 1) rolled back to 5 minutes (around Mar 7), impacting long sessions
Token economics: 5-minute cache write +25%; 1-hour cache write +100%; cache reads ~10% base price
Anthropic position: Claude Code auto-selects TTL; no global TTL setting planned; 5-minute TTL suits one-shot requests
Quota impacts reported: $200/month users saw quota limits start in March; faster “burn rate” affecting usability
Context window multiplier: paid plans support 1M-token context; cache misses can be costly after breaks
Possible mitigations under review: investigating 400,000-token default context, keeping 1M tokens configurable

Anthropic’s Claude Code cache confusion is turning into a practical problem for developers who rely on long-running, high-context coding sessions—especially as more reports point to quotas draining faster than expected after a behind-the-scenes change to prompt-cache TTL.

A five-minute cache meets long-session workflows

The core of the complaint: Anthropic previously introduced a one-hour prompt cache for Claude Code context around February 1, then rolled it back to a five-minute cache around March 7. Developer Sean Swanson documented the shift in a GitHub bug report, arguing that the shorter TTL “is disproportionately punishing” for the kind of extended, iterative sessions Claude Code tends to enable.

In Claude’s pricing mechanics, caching is a major lever because context is expensive: it’s extra data shipped with prompts—code, instructions, background—so the model can stay accurate over time. Claude prompt caching reduces repeated work by reusing previously processed prompts. But the economics can be unintuitive:

Writing to the five-minute cache costs 25% more in tokens
Writing to the one-hour cache costs 100% more
Reading from cache is roughly 10% of the base price

Anthropic staffer Jarred Sumner responded that the five-minute TTL can be cheaper overall because many Claude Code requests are effectively one-shot calls—cache once, never revisit—so paying the higher one-hour write cost doesn’t help. Sumner also noted Claude Code picks TTL automatically, with no global setting planned.

Context windows, cache misses, and quota shock

Swanson later acknowledged that fast-moving subagent-style sessions can benefit from the cheaper write path when caches “almost never expire.” Still, the broader issue remains: Swanson said that after months on a $200/month plan, quota limits only started hitting in March—and the new “burn rate” is changing how usable the service feels.

Claude Code creator Boris Cherny also flagged another multiplier: the 1M-token context window on paid plans. Cache misses at that size can be punishing; Cherny noted that stepping away for over an hour and returning to a stale session can trigger a full miss. Anthropic is reportedly investigating a 400,000-token default context window, while keeping 1M tokens as an option via configuration.

Meanwhile, some Pro users ($20/month) report quota exhaustion to the point of getting as few as two prompts in five hours, and others suggest caching bugs may be skewing the numbers enough that TTL debates don’t fully explain what’s happening.

Original source

Claude Code cache change sparks reports of fast quota burn

TL;DR

A five-minute cache meets long-session workflows

Context windows, cache misses, and quota shock

Continue the conversation on Slack

Related Articles

Anthropic launches Claude Opus 4.8 with faster, cheaper mode

Anthropic lays out best practices for Claude Code at scale

Anthropic adds dedicated monthly credits for paid Claude API usage