Why Claude Fable 5 thrives on loops and memory

In a thread on X, Lance Martin argues that “loops” and memory are now central to getting useful results from Claude Fable 5, the Mythos-class model Anthropic has been promoting for agentic work. His post focuses on two patterns: self-correction loops that let a model respond to environmental feedback, and memory that carries lessons across sessions.

Martin’s first example centers on what he calls “self-correction loops.” He points to Anthropic primitives such as /goal in Claude Code and “Outcomes” in Claude Managed Agent as ways to turn a task into an evaluation loop. In his telling, Fable 5 handles that setup well: a goal or rubric gives the model feedback, the model iterates, and it continues until the rubric is satisfied.

To test that idea, Martin used an open-source benchmark called Parameter Golf, which asks an agent to modify training code, launch training, read logs, and decide what to do next. He compared Fable 5 with Opus 4.7 through Claude Managed Agents and a self-hosted sandbox with 8xH100 GPUs. For judging, he emphasizes that verification should happen in a separate context window rather than by having the model critique its own output. He notes that a verifier sub-agent “tends to outperform” self-critique, and that Outcomes can spawn a grader sub-agent automatically.

According to Martin, the result was that Fable 5 improved the training pipeline “~6x more than Opus 4.7.” He describes Fable 5 as favoring larger structural changes and persisting through setbacks, including a quantization regression that did not derail its best run. Opus 4.7, by contrast, appears to have leaned toward smaller scalar tweaks after an initial modest gain.

Martin’s second example turns to memory across sessions. He says memory is another area where Fable 5 excels, and frames it as an outer loop: Claude writes notes during one session, then retrieves them later. Using a benchmark from Continual Learning Bench 1.0, he compared Fable 5, Opus 4.7 and Sonnet 4.6 on sequential SQL questions split across sessions.

In that setup, Martin describes a progression of “fail, investigate, verify, distill, consult.” Sonnet 4.6, he says, tends to stop around the first step, keeping only rough failure notes. Opus 4.7 generally gets farther, but Martin says verification coverage remains low. Fable 5, by contrast, “tends to complete the progression,” with its strongest runs reaching “up to 73% (22 of 30)” verification coverage, according to the post.

The thread drew a mix of technical curiosity and skepticism in replies. One commenter asked whether the model is “mythos,” another questioned whether some requests get flagged, and others raised concerns about token usage and cost. Filecoin also chimed in to say that “the outer loop across sessions is only as good as the memory behind it,” adding that “verifiable, portable storage” gives an agent a record it can trust.

Martin closes by arguing that, rather than steering Fable 5 directly, it is often better to design loops that let it self-correct and manage its own context. He points readers to Anthropic docs and Claude Code for more on prompting, /goal, Managed Agents and memory.

Source: X

Why Claude Fable 5 thrives on loops and memory

TL;DR

Continue the conversation on Slack

Related Articles

Anthropic launches Claude Fable 5 and Mythos 5 models

Claude Code adds nested subagents to keep context under control

Claude Code’s /fork now runs background agents with full context