MiniMax M2.7 targets agent workflows with self-improving eval harness

MiniMax-M2.7 is MiniMax’s latest model update, positioned around a theme that shows up more and more often in agent-centric releases: not just improving the model, but improving the system that trains and evaluates it. Alongside benchmark claims and workflow targets like software engineering and “professional workspace” tasks, MiniMax is framing M2.7 as a step toward models that can recursively improve their own harness—the scaffolding that gathers feedback, builds evals, and drives iteration.

What MiniMax is claiming in M2.7

MiniMax says M2.7 is its “first model which deeply participated in its own evolution,” and pegs it at an 88% win-rate vs M2.5. The announcement breaks improvements into three main buckets:

Software engineering performance, with an ops-adjacent metric

M2.7 is described as “production-ready SWE,” citing:

SWE-Pro: 56.22%
Terminal Bench 2: 57.0%

MiniMax also highlights a real-world operational outcome rather than a leaderboard number: it says M2.7 reduced intervention-to-recovery time for online incidents to 3 minutes on certain occasions. Several replies in the thread treated that stat as more meaningful than lab-style scoring, since it maps directly to incident response loops.

Agentic behavior and tool reliability

On the agent side, MiniMax says M2.7 was trained for Agent Teams and a “tool search tool,” and reports 97% skill adherence across 40+ complex skills. In the same section, MiniMax claims M2.7 is “on par with Sonnet 4.6” in OpenClaw, a comparison that drew both curiosity and some immediate anecdotal testing from early commenters.

“Professional workspace” and file editing

The third pillar is office-style knowledge work: MiniMax says M2.7 supports multi-turn, high-fidelity Office file editing, and claims SOTA performance for “professional knowledge.”

The more interesting part: a harness that updates itself

In a follow-up tweet, MiniMax argues the iteration loop matters as much as the model: the company describes an internal harness that autonomously collects feedback, builds evaluation sets for internal tasks, and then continuously iterates on its own architecture, skills/MCP implementation, and memory mechanisms.

That framing is notable because it shifts attention from single-pass training runs to an ongoing system: evaluation doesn’t just measure the model, it becomes a mechanism the model (and its surrounding stack) uses to evolve.

Access links MiniMax shared

MiniMax provided links for getting started across product surfaces:

MiniMax Agent: https://t.co/aIzrFYcfUz
API: https://t.co/fHRdSV7BwZ
Token Plan: https://t.co/BDCycxepZw

Some replies also immediately raised practical deployment questions—such as whether model weights will be published, and how availability maps to specific “coding plan” tiers—though MiniMax didn’t answer those in the provided thread.

Original source