MiniMax M2.5: Open-Source Frontier Model Tuned for Productivity and Agents

MiniMax M2.5 debuts as an open-source frontier model built for real-world productivity, prioritizing tool-calling and agentic workflows. MiniMax highlights SWE-Bench 80.2, BFCL 76.8, BrowseComp 76.3, 37% faster execution, and a $1/hr runtime option.

MiniMax M2.5: Open-Source Frontier Model Tuned for Productivity and Agents

TL;DR

  • MiniMax M2.5: Open-source frontier model positioned for real-world productivity over benchmark optimization
  • Benchmarks: SWE-Bench 80.2%, BrowseComp 76.3%, BFCL 76.8% (tool-calling/agentic capability)
  • Efficiency and cost: 37% faster on complex tasks; runtime option $1/hour with 100 tps
  • Comparisons: BFCL noted ahead of several competitors; BrowseComp still trails some models; multi-codebase bug-fixing improved
  • VIBE-Pro: Upgraded internal benchmark; M2.5 reported on par with Opus 4.5 (https://t.co/j9F36GkY7P)
  • Resources: MiniMax Agent (https://t.co/aIzrFYcfUz), API (https://t.co/fHRdSV7BwZ), CodingPlan (https://t.co/FDhZBBjQrX)

MiniMax M2.5 arrives as a productivity-focused frontier model

MiniMax M2.5 debuted as an open-source frontier model built with an emphasis on real-world productivity rather than pure benchmark chasing. Announced alongside links to a MiniMax Agent, an API, and a public CodingPlan, the release highlights performance across coding, web comprehension, and tool-calling workloads, plus efficiency gains and an economical runtime option.

Key performance and efficiency claims

The announcement lists several headline figures:

  • SWE-Bench: 80.2%, positioned as strong coding performance.
  • BrowseComp: 76.3%, reflecting web comprehension ability.
  • BFCL (tool-calling/agentic capability): 76.8%, a focal point for autonomous workflows.
  • 37% faster execution on complex tasks compared with prior iterations.
  • A runtime pricing point of $1 per hour with 100 tps, presented as enabling cost-effective scaling for long-horizon agents.

Links shared with the announcement include the MiniMax Agent (https://t.co/aIzrFYcfUz), the API (https://t.co/fHRdSV7BwZ), and a public CodingPlan (https://t.co/FDhZBBjQrX).

How it stacks up against other frontiers

Community-shared comparisons put those numbers in context:

  • SWE-Bench at 80.2 is roughly level with other high-performing models in that measure.
  • On BFCL, the model scores 76.8, which was noted as ahead of several competitors in tool and API usage tasks.
  • On multi-codebase bug-fixing evaluations, M2.5 shows improvements that edge it ahead on some aggregated metrics.
  • BrowseComp remains an area where other models still lead in some comparisons.

Separately, MiniMax announced an upgraded internal benchmark, VIBE-Pro, which increases task complexity and domain coverage; the team reports M2.5 performs on par with Opus 4.5 on that suite (link: https://t.co/j9F36GkY7P).

Focus on tool-calling and agents

The release emphasizes effective search and tool-calling as prerequisites for autonomous handling of complex tasks, citing industry benchmark performance in BrowseComp and the Berkeley Function Calling Leaderboard (https://t.co/ONhdzQF3sL). The reported BFCL gains suggest the model’s architecture or training prioritized reliable API/tool invocation and structured interactions useful for agentic workflows.

Community response and open questions

Reaction on social media mixed technical enthusiasm with practical queries:

  • Questions about parameter counts and whether weights will be released appeared repeatedly.
  • Multiple users asked about hardware and deployment (local vs cloud, Mac Studio specs, tokens/sec targets) and the model’s runtime characteristics.
  • A few users reported parsing or output fragmentation issues in early experiments.
  • A community contributor shared Cursor integration rules and a GitHub repo to help with tool-calling when using custom/open models: https://t.co/Y9t2L6hmhG (test link: https://t.co/FfLnShsh9b).

Engagement on the announcement tweet indicated significant interest across developers and organizations.

What’s available now

Alongside performance claims, MiniMax published links to the Agent, API, and CodingPlan (https://t.co/aIzrFYcfUz, https://t.co/fHRdSV7BwZ, https://t.co/FDhZBBjQrX) for those looking to evaluate integration paths or explore the model’s behavior in agentic setups.

Original announcement: https://x.com/MiniMax_AI/status/2021980761210134808

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community