Kimi K2.5 is introduced as a new open-source, native multimodal model trained on roughly 15T mixed visual and text tokens and designed to push capabilities in both coding and vision while introducing a self-directed agent swarm execution paradigm. The model is presented on Kimi K2.5 and is exposed through Kimi.com, the Kimi App, the public API, and the open-source Kimi Code CLI.
What K2.5 brings
K2.5 targets three practical areas: coding with vision, coordinated multi-agent execution, and office productivity. It arrives with four interaction modes on Kimi.com and the Kimi App—K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (Beta)—and the Agent Swarm experience is currently in beta with free credits for some high-tier paid accounts.
Coding with vision
K2.5 is positioned as the strongest open-source model to date for coding tasks, with particular strength in front-end development. Key capabilities include:
- Turning conversational prompts into complete front-end interfaces with interactive layouts and scroll-triggered animations.
- Image/video-to-code generation and visual debugging, enabled by joint vision-text pretraining at scale.
- Improved results on an internal evaluation suite, Kimi Code Bench, showing consistent gains over the previous K2 model across building, debugging, refactoring, testing, and scripting tasks.
Kimi Code (open-sourced) runs in the terminal and integrates with IDEs such as VSCode, Cursor, and Zed. It accepts images and videos as inputs, can discover and migrate existing skills, and is recommended for agentic coding workflows through the K2.5 Agent experience.
Agent Swarm: scaling out with PARL
Rather than simply increasing single-agent size, K2.5 introduces a research-preview Agent Swarm that can spawn up to 100 sub-agents and coordinate as many as 1,500 tool calls. The training approach, PARL, uses a trainable orchestrator that decomposes tasks and instantiates frozen subagents to run subtasks concurrently.
To encourage real parallel strategies, the training pipeline applies staged reward shaping—an auxiliary reward that initially incentivizes parallelism and is annealed away as training progresses—helping avoid “serial collapse” where an orchestrator reverts to sequential execution. A latency-focused metric called Critical Steps (inspired by critical-path analysis) is used to evaluate speedups: spawning subtasks only helps if it shortens the critical path. Internal evaluations report up to an 80% reduction in end-to-end runtime for complex tasks and up to 4.5× wall-clock reduction versus single-agent runs in some scenarios.
Office productivity and benchmarks
K2.5 is also shown on internal productivity benchmarks to handle dense, long-form knowledge work—documents, spreadsheets, PDFs, and slide decks—delivering multi-step tool use and long outputs (examples include 10,000-word papers and 100-page documents). On two internal benchmarks (AI Office and General Agent), K2.5 reportedly improves over the previous K2 Thinking model by 59.3% and 24.3%, respectively.
Experimental settings cited include temperature = 1.0, top-p = 0.95, and a context length up to 256k tokens for many K2.5 runs. Tool-augmented and vision benchmark protocols are described in the technical notes linked from the announcement.
K2.5 is available through the Kimi ecosystem—Kimi.com, the Kimi App, the public API, and the open-source Kimi Code—with additional details, benchmarks, and technical notes in the original announcement.
Read the original announcement: https://www.kimi.com/blog/kimi-k2-5.html?