Why Claude Fable 5 thrives on loops and memory

Lance Martin says Claude Fable 5 performs best when built around self-correction loops and cross-session memory. Using benchmarks like Parameter Golf and Continual Learning Bench, he reports stronger gains than Opus 4.7—while replies debate cost and verification.

claude cover

TL;DR

  • Loops + memory: Lance Martin argues both are central for useful results from Claude Fable 5 in agentic tasks
  • Self-correction loops: Uses /goal (Claude Code) and Outcomes (Claude Managed Agent) to iterate until a rubric passes
  • Parameter Golf benchmark: Agent edits training code, runs training, reads logs, and iterates; compared Fable 5 vs Opus 4.7
  • Separate verification: Prefer verifier sub-agent over self-critique; Outcomes can spawn a grader sub-agent automatically
  • Reported results: Fable 5 ~6x pipeline improvement vs Opus 4.7; more structural changes and persistence through regressions
  • Cross-session memory: Outer-loop notes and retrieval; Continual Learning Bench 1.0 SQL sessions show Fable 5 up to 73% (22/30) verification coverage

In a thread on X, Lance Martin argues that “loops” and memory are now central to getting useful results from Claude Fable 5, the Mythos-class model Anthropic has been promoting for agentic work. His post focuses on two patterns: self-correction loops that let a model respond to environmental feedback, and memory that carries lessons across sessions.

Martin’s first example centers on what he calls “self-correction loops.” He points to Anthropic primitives such as /goal in Claude Code and “Outcomes” in Claude Managed Agent as ways to turn a task into an evaluation loop. In his telling, Fable 5 handles that setup well: a goal or rubric gives the model feedback, the model iterates, and it continues until the rubric is satisfied.

To test that idea, Martin used an open-source benchmark called Parameter Golf, which asks an agent to modify training code, launch training, read logs, and decide what to do next. He compared Fable 5 with Opus 4.7 through Claude Managed Agents and a self-hosted sandbox with 8xH100 GPUs. For judging, he emphasizes that verification should happen in a separate context window rather than by having the model critique its own output. He notes that a verifier sub-agent “tends to outperform” self-critique, and that Outcomes can spawn a grader sub-agent automatically.

According to Martin, the result was that Fable 5 improved the training pipeline “~6x more than Opus 4.7.” He describes Fable 5 as favoring larger structural changes and persisting through setbacks, including a quantization regression that did not derail its best run. Opus 4.7, by contrast, appears to have leaned toward smaller scalar tweaks after an initial modest gain.

Martin’s second example turns to memory across sessions. He says memory is another area where Fable 5 excels, and frames it as an outer loop: Claude writes notes during one session, then retrieves them later. Using a benchmark from Continual Learning Bench 1.0, he compared Fable 5, Opus 4.7 and Sonnet 4.6 on sequential SQL questions split across sessions.

In that setup, Martin describes a progression of “fail, investigate, verify, distill, consult.” Sonnet 4.6, he says, tends to stop around the first step, keeping only rough failure notes. Opus 4.7 generally gets farther, but Martin says verification coverage remains low. Fable 5, by contrast, “tends to complete the progression,” with its strongest runs reaching “up to 73% (22 of 30)” verification coverage, according to the post.

The thread drew a mix of technical curiosity and skepticism in replies. One commenter asked whether the model is “mythos,” another questioned whether some requests get flagged, and others raised concerns about token usage and cost. Filecoin also chimed in to say that “the outer loop across sessions is only as good as the memory behind it,” adding that “verifiable, portable storage” gives an agent a record it can trust.

Martin closes by arguing that, rather than steering Fable 5 directly, it is often better to design loops that let it self-correct and manage its own context. He points readers to Anthropic docs and Claude Code for more on prompting, /goal, Managed Agents and memory.

Source: X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community