Anthropic's Claude Opus 4.6 Adds 1M-Token Context and Coding Power

Anthropic’s Claude Opus 4.6 arrives as an incremental but notable step for Opus-class models, with a focus on stronger coding capabilities, longer-running agentic workflows, and markedly improved long-context handling. The release combines model upgrades, new developer controls, and product integrations aimed at complex knowledge work.

Key technical advances

Opus 4.6 targets several developer and knowledge-worker pain points:

1M token context (beta): Opus 4.6 is the first Opus-class model to offer a million-token context window in beta, enabling much larger documents and longer agent sessions.
Improved coding and agentic planning: Stronger planning, debugging, and code-review skills that operate more reliably across large codebases and longer sessions.
Better long-context retrieval and reasoning: Demonstrated gains at retrieving and reasoning over buried details in vast text corpora, reducing “context rot.”
Expanded output capacity: Support for up to 128k output tokens so large outputs do not need frequent splitting.
Safety updates: Comprehensive safety evaluations show low rates of misaligned behavior and lower over-refusal compared with recent models.

Developer controls and API features

Several platform features accompany the model to support longer, more autonomous workflows:

Adaptive thinking: The model can decide when deeper reasoning is useful rather than relying on a binary on/off for extended thinking.
Effort controls: Four effort levels — low, medium, high (default), and max — let developers trade intelligence, latency, and cost.
Context compaction (beta): Automatic summarization and replacement of older context as conversations near configured thresholds, helping agents continue longer tasks without hitting hard limits.
Premium long-context pricing: For prompts beyond 200k tokens, premium rates apply. US-only inference is available at 1.1× token pricing.
API model name: use claude-opus-4-6 via the Claude API.

These controls aim to balance the model’s tendency to “think longer” on hard tasks while providing levers to reduce cost and latency on simpler requests.

Benchmark and capability highlights

Opus 4.6 posts leading scores across multiple evaluations cited by Anthropic:

Top performance on agentic coding evaluation Terminal-Bench 2.0.
Leading score on Humanity’s Last Exam, a multidisciplinary reasoning test.
GDPval-AA: Opus 4.6 outperforms the industry’s next-best model (OpenAI’s GPT-5.2) by about 144 Elo points, and exceeds Opus 4.5 by 190 Elo points on this economically oriented benchmark.
BrowseComp: Opus 4.6 achieved the best result reported for deep, multi-step web search tasks.
Needle-in-a-haystack retrieval: On the 8-needle 1M variant of MRCR v2, Opus 4.6 scored 76% versus 18.5% for Sonnet 4.5, illustrating a qualitative improvement in long-context retrieval.

Additional reported gains cover root-cause analysis, multilingual coding, cybersecurity vulnerability detection, long-term coherence, and life-sciences reasoning.

Product integrations and workflows

The release includes product-level updates to leverage Opus 4.6’s strengths in real workflows:

Agent teams (Claude Code research preview): Multiple subagents can run in parallel and coordinate autonomously, useful for tasks that split into independent, read-heavy subtasks such as codebase reviews.
Claude in Excel: Upgrades for handling long-running, multi-step spreadsheet tasks, inferring structure from unstructured data and applying multi-step changes in one pass.
Claude in PowerPoint: Released as a research preview for Max, Team, and Enterprise plans; the model reads slide layouts, masters, and fonts to generate on-brand decks.
Cowork workflows: Opus 4.6’s multitasking and agentic abilities are highlighted for driving autonomous workflows across documents, spreadsheets, and presentations.

Safety and cybersecurity approach

Anthropic reports one of the most comprehensive safety evaluation suites applied to any model to date, including new probes for user wellbeing and refusal behavior. Opus 4.6 shows a safety profile comparable to or better than its predecessor, with low rates of deceptive or otherwise misaligned outputs and fewer over-refusals. Given the model’s upgraded cybersecurity capabilities, Anthropic also introduced six new cybersecurity probes and describes efforts to use the model defensively — for example, to help find and patch vulnerabilities in open-source software.

Availability and pricing

Claude Opus 4.6 is available today on claude.ai, via the Claude API, and on major cloud platforms. Standard API pricing remains $5 / $25 per million tokens; premium pricing applies for prompts exceeding 200k tokens (listed at $10 / $37.50 per million input/output tokens). US-only inference is available at 1.1× token pricing. Developers can call the model as claude-opus-4-6.

For full technical details, benchmarks, and the system card with methodology and safety test descriptions, see the original announcement: https://www.anthropic.com/news/claude-opus-4-6