Claude Code’s 1M-token memory can backfire without session discipline

Claude Code power users are learning that a million-token context window cuts both ways. In a thread on X, Thariq outlined how Claude Code sessions can grow into a real asset for longer, more autonomous work—and just as easily drift into “context pollution” without deliberate session management.

The key point: a larger context window doesn’t remove the need to manage context. It mostly changes the failure modes. Instead of hard-stopping early, long sessions can quietly degrade as more tool output, file reads, and dead-end attempts accumulate.

Context, compaction, and “context rot”

In the thread, Thariq notes that for the “1MM context model,” some level of context rot can show up around ~300–400k tokens, while emphasizing that it depends on the task.

When a session approaches the context limit (a hard cutoff), Claude Code relies on compaction—summarizing the session into a smaller description to continue in a fresh window. Compaction can also be triggered manually.

Every turn is a branching point

A practical framing in the post is that once Claude finishes a turn, the next message is a decision point—because each additional “continue” adds more weight (and more noise) to the working set.

Thariq lists a set of alternatives to simply continuing:

Continue in the same session
/rewind (Esc Esc) to jump back to a previous message and drop everything after it from context
/clear to start a new session with a distilled brief
Compact to replace the full transcript with a summary
Subagents to delegate work to a fresh context and bring back only the result

The emphasis isn’t that any one option is always best, but that these are tools for shaping what stays in the model’s “line of sight.”

When to start fresh vs. when to keep going

A rule of thumb offered in the thread: a new task should generally mean a new session. The gray area is adjacent work where reusing context may be efficient—documentation for a feature that was just implemented is given as an example. In that case, keeping the session can avoid re-reading the same files, at the cost of carrying extra context that may not matter for the doc-writing step.

Rewind as a “good habit” for correction

If there’s one behavior Thariq calls out as a signal of strong context management, it’s rewind.

Rather than stacking corrective instructions on top of a failed attempt (“that didn’t work, try X instead”), Thariq suggests rewinding to just after relevant file reads or setup steps, then re-prompting with what was learned. That trims away the dead-end reasoning and output that would otherwise linger in context.

In a follow-up reply, Thariq also says rewind remains a prompt cache hit, addressing a concern about whether rewinding a long debugging conversation would cause a cache miss.

Why compacts go bad—and how to steer them

Compaction is described as lossy: it keeps moving, but it requires trusting Claude to decide what matters. Thariq’s explanation for “bad compacts” centers on predictability: if the model can’t infer where the work is headed, it may summarize the wrong things—especially when an autocompact fires after a long detour (like debugging), and the next step pivots to a different thread.

Two notable points from the thread:

Because of context rot, the model may be at its least sharp when compacting late in a long session.
With a 1M window, there’s more room to compact proactively and give guidance, such as focusing the summary on a specific refactor and dropping irrelevant debugging.

When asked about best practice for /compact, Thariq’s short prescription was: give it a hint.

Subagents as a way to avoid dragging tool output around

For work that generates lots of intermediate output that won’t be needed later, Thariq recommends subagents. The thread describes subagents as running in their own clean context window, returning only a synthesized result to the parent session.

A simple heuristic is provided: will the work require the raw tool output later, or just the conclusion? If it’s the conclusion, it’s a subagent candidate.

This also comes up in response to a request for “surgical” removal of unhelpful tool-call tokens: Thariq notes that removing tokens from a transcript can confuse Claude, and points to subagents as the answer for search-heavy workflows.

Open questions from the thread

The replies read like a checklist of what developers still want from long-context coding tools:

A way to see what percentage of the context window is left
More control over selectively removing unwanted context (without full compaction)
Desktop feature parity questions (for example, /rewind availability)
Discussion of LLM cache management and prompt caching mechanics

The underlying theme is consistent: context management is becoming a first-class part of AI-assisted coding workflows, especially as session length stops being the limiting factor.

Original source: https://x.com/trq212/status/2044548257058328723