Kimi K2.5: Open-Source Multimodal Model and Agent Swarm

Kimi K2.5 is an open-source multimodal model pretrained on ~15 trillion vision and text tokens, focused on coding-with-vision and productivity. It introduces a PARL-powered Agent Swarm that can spawn up to 100 sub-agents and deliver sizable runtime speedups.

Kimi K2.5: Open-Source Multimodal Model and Agent Swarm

TL;DR

  • Kimi K2.5 — trained on ~15T mixed visual+text tokens and released as an open-source native multimodal model
  • Four interaction modes on Kimi.com and the Kimi App: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (Beta), with beta Agent Swarm free credits for some high-tier paid accounts
  • Coding-with-vision strengths: generates complete front-end interfaces with interactive layouts and scroll-triggered animations, supports image/video-to-code and visual debugging, and improves over K2 on the internal Kimi Code Bench
  • Kimi Code (open-sourced): terminal CLI, integrates with VSCode/Cursor/Zed, accepts images/videos, and is recommended for agentic coding via the K2.5 Agent — http://www.kimi.com/code
  • Agent Swarm (PARL): trainable orchestrator that spawns up to 100 sub-agents and coordinates up to 1,500 tool calls; uses staged reward shaping and a Critical Steps latency metric; internal evaluations report up to 80% end-to-end runtime reduction and up to 4.5× wall-clock reduction vs single-agent runs
  • Office productivity and benchmarks: handles dense long-form work (examples: 10,000-word papers, 100-page documents), multi-step tool use, and shows +59.3% on AI Office and +24.3% on General Agent vs K2 Thinking; common experimental settings include temperature=1.0, top-p=0.95, and context length up to 256k tokens

Kimi K2.5 is introduced as a new open-source, native multimodal model trained on roughly 15T mixed visual and text tokens and designed to push capabilities in both coding and vision while introducing a self-directed agent swarm execution paradigm. The model is presented on Kimi K2.5 and is exposed through Kimi.com, the Kimi App, the public API, and the open-source Kimi Code CLI.

What K2.5 brings

K2.5 targets three practical areas: coding with vision, coordinated multi-agent execution, and office productivity. It arrives with four interaction modes on Kimi.com and the Kimi App—K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (Beta)—and the Agent Swarm experience is currently in beta with free credits for some high-tier paid accounts.

Coding with vision

K2.5 is positioned as the strongest open-source model to date for coding tasks, with particular strength in front-end development. Key capabilities include:

  • Turning conversational prompts into complete front-end interfaces with interactive layouts and scroll-triggered animations.
  • Image/video-to-code generation and visual debugging, enabled by joint vision-text pretraining at scale.
  • Improved results on an internal evaluation suite, Kimi Code Bench, showing consistent gains over the previous K2 model across building, debugging, refactoring, testing, and scripting tasks.

Kimi Code (open-sourced) runs in the terminal and integrates with IDEs such as VSCode, Cursor, and Zed. It accepts images and videos as inputs, can discover and migrate existing skills, and is recommended for agentic coding workflows through the K2.5 Agent experience.

Agent Swarm: scaling out with PARL

Rather than simply increasing single-agent size, K2.5 introduces a research-preview Agent Swarm that can spawn up to 100 sub-agents and coordinate as many as 1,500 tool calls. The training approach, PARL, uses a trainable orchestrator that decomposes tasks and instantiates frozen subagents to run subtasks concurrently.

To encourage real parallel strategies, the training pipeline applies staged reward shaping—an auxiliary reward that initially incentivizes parallelism and is annealed away as training progresses—helping avoid “serial collapse” where an orchestrator reverts to sequential execution. A latency-focused metric called Critical Steps (inspired by critical-path analysis) is used to evaluate speedups: spawning subtasks only helps if it shortens the critical path. Internal evaluations report up to an 80% reduction in end-to-end runtime for complex tasks and up to 4.5× wall-clock reduction versus single-agent runs in some scenarios.

Office productivity and benchmarks

K2.5 is also shown on internal productivity benchmarks to handle dense, long-form knowledge work—documents, spreadsheets, PDFs, and slide decks—delivering multi-step tool use and long outputs (examples include 10,000-word papers and 100-page documents). On two internal benchmarks (AI Office and General Agent), K2.5 reportedly improves over the previous K2 Thinking model by 59.3% and 24.3%, respectively.

Experimental settings cited include temperature = 1.0, top-p = 0.95, and a context length up to 256k tokens for many K2.5 runs. Tool-augmented and vision benchmark protocols are described in the technical notes linked from the announcement.

K2.5 is available through the Kimi ecosystem—Kimi.com, the Kimi App, the public API, and the open-source Kimi Code—with additional details, benchmarks, and technical notes in the original announcement.

Read the original announcement: https://www.kimi.com/blog/kimi-k2-5.html?

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community