MiniMax M2.5 arrives as a productivity-focused frontier model
MiniMax M2.5 debuted as an open-source frontier model built with an emphasis on real-world productivity rather than pure benchmark chasing. Announced alongside links to a MiniMax Agent, an API, and a public CodingPlan, the release highlights performance across coding, web comprehension, and tool-calling workloads, plus efficiency gains and an economical runtime option.
Key performance and efficiency claims
The announcement lists several headline figures:
- SWE-Bench: 80.2%, positioned as strong coding performance.
- BrowseComp: 76.3%, reflecting web comprehension ability.
- BFCL (tool-calling/agentic capability): 76.8%, a focal point for autonomous workflows.
- 37% faster execution on complex tasks compared with prior iterations.
- A runtime pricing point of $1 per hour with 100 tps, presented as enabling cost-effective scaling for long-horizon agents.
Links shared with the announcement include the MiniMax Agent (https://t.co/aIzrFYcfUz), the API (https://t.co/fHRdSV7BwZ), and a public CodingPlan (https://t.co/FDhZBBjQrX).
How it stacks up against other frontiers
Community-shared comparisons put those numbers in context:
- SWE-Bench at 80.2 is roughly level with other high-performing models in that measure.
- On BFCL, the model scores 76.8, which was noted as ahead of several competitors in tool and API usage tasks.
- On multi-codebase bug-fixing evaluations, M2.5 shows improvements that edge it ahead on some aggregated metrics.
- BrowseComp remains an area where other models still lead in some comparisons.
Separately, MiniMax announced an upgraded internal benchmark, VIBE-Pro, which increases task complexity and domain coverage; the team reports M2.5 performs on par with Opus 4.5 on that suite (link: https://t.co/j9F36GkY7P).
Focus on tool-calling and agents
The release emphasizes effective search and tool-calling as prerequisites for autonomous handling of complex tasks, citing industry benchmark performance in BrowseComp and the Berkeley Function Calling Leaderboard (https://t.co/ONhdzQF3sL). The reported BFCL gains suggest the model’s architecture or training prioritized reliable API/tool invocation and structured interactions useful for agentic workflows.
Community response and open questions
Reaction on social media mixed technical enthusiasm with practical queries:
- Questions about parameter counts and whether weights will be released appeared repeatedly.
- Multiple users asked about hardware and deployment (local vs cloud, Mac Studio specs, tokens/sec targets) and the model’s runtime characteristics.
- A few users reported parsing or output fragmentation issues in early experiments.
- A community contributor shared Cursor integration rules and a GitHub repo to help with tool-calling when using custom/open models: https://t.co/Y9t2L6hmhG (test link: https://t.co/FfLnShsh9b).
Engagement on the announcement tweet indicated significant interest across developers and organizations.
What’s available now
Alongside performance claims, MiniMax published links to the Agent, API, and CodingPlan (https://t.co/aIzrFYcfUz, https://t.co/fHRdSV7BwZ, https://t.co/FDhZBBjQrX) for those looking to evaluate integration paths or explore the model’s behavior in agentic setups.
Original announcement: https://x.com/MiniMax_AI/status/2021980761210134808