Anthropic’s Claude Code cache confusion is turning into a practical problem for developers who rely on long-running, high-context coding sessions—especially as more reports point to quotas draining faster than expected after a behind-the-scenes change to prompt-cache TTL.
A five-minute cache meets long-session workflows
The core of the complaint: Anthropic previously introduced a one-hour prompt cache for Claude Code context around February 1, then rolled it back to a five-minute cache around March 7. Developer Sean Swanson documented the shift in a GitHub bug report, arguing that the shorter TTL “is disproportionately punishing” for the kind of extended, iterative sessions Claude Code tends to enable.
In Claude’s pricing mechanics, caching is a major lever because context is expensive: it’s extra data shipped with prompts—code, instructions, background—so the model can stay accurate over time. Claude prompt caching reduces repeated work by reusing previously processed prompts. But the economics can be unintuitive:
- Writing to the five-minute cache costs 25% more in tokens
- Writing to the one-hour cache costs 100% more
- Reading from cache is roughly 10% of the base price
Anthropic staffer Jarred Sumner responded that the five-minute TTL can be cheaper overall because many Claude Code requests are effectively one-shot calls—cache once, never revisit—so paying the higher one-hour write cost doesn’t help. Sumner also noted Claude Code picks TTL automatically, with no global setting planned.
Context windows, cache misses, and quota shock
Swanson later acknowledged that fast-moving subagent-style sessions can benefit from the cheaper write path when caches “almost never expire.” Still, the broader issue remains: Swanson said that after months on a $200/month plan, quota limits only started hitting in March—and the new “burn rate” is changing how usable the service feels.
Claude Code creator Boris Cherny also flagged another multiplier: the 1M-token context window on paid plans. Cache misses at that size can be punishing; Cherny noted that stepping away for over an hour and returning to a stale session can trigger a full miss. Anthropic is reportedly investigating a 400,000-token default context window, while keeping 1M tokens as an option via configuration.
Meanwhile, some Pro users ($20/month) report quota exhaustion to the point of getting as few as two prompts in five hours, and others suggest caching bugs may be skewing the numbers enough that TTL debates don’t fully explain what’s happening.
