DeepSeek V4 Pro and Flash benchmarked against Claude Opus

A recent benchmark write-up by Darko at Kilo Blog tests DeepSeek V4 Pro and Flash against Claude Opus 4.7 and Kimi K2.6 using a heavier FlowGraph workflow. It digs into where each model shines, stumbles, and how pricing shifts the value equation.

May 20, 2026

•

DeepSeek Benchmark Kimi

TL;DR

Benchmark scope: DeepSeek V4 Pro and V4 Flash vs Claude Opus 4.7 and Kimi K2.6, using a heavier FlowGraph setup
V4 Pro performance: Stronger than Kimi K2.6, but behind Claude Opus 4.7 on tougher workflow segments
V4 Pro economics: Temporary pricing promotion and lower cache pricing improve cost attractiveness beyond list price
V4 Flash positioning: Dramatically cheaper than others, with more uneven workflow handling overall
Tool/agent behavior: V4 Flash tool-calling held up better than expected at its low price point
Failure analysis focus: Scheduling, recovery, validation, and build integrity; coordination-heavy edge cases still differentiate models

Kilo Blog’s latest benchmark write-up looks at DeepSeek V4 Pro and DeepSeek V4 Flash alongside Claude Opus 4.7 and Kimi K2.6, using the same heavier FlowGraph setup the publication used in an earlier comparison. The post paints DeepSeek’s new open-weight pair as relevant contenders, with Pro landing in the middle of the pack and Flash aiming at the ultra-low-cost end of the market.

According to the blog, V4 Pro came in with a stronger showing than Kimi K2.6, while still trailing Claude Opus 4.7 on the tougher parts of the workflow. The write-up also notes that DeepSeek’s temporary pricing promotion and lower cache pricing make the model more attractive on cost than its list price alone might suggest.

V4 Flash, meanwhile, appears to be a different kind of product altogether. The post describes it as dramatically cheaper than the other models in the comparison, but also more uneven in how it handled the workflow. Even so, the author suggests its agent/tool-calling behavior held up better than expected for such a low-price run.

The rest of the article focuses on where each model stumbled in a complex backend-building task, especially around scheduling, recovery, validation, and build integrity. Rather than treating benchmark scores as the whole story, the post argues that the gap between open-weight and proprietary models is narrowing in broad coverage, while the hardest coordination-heavy edge cases remain the real separator.

Readers interested in the detailed failure cases, the cost-per-point breakdown, and the side-by-side scoring table can find the full post here: Kilo Blog.

Source: Kilo Blog

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community

Kimi Web Bridge promises human-like browser automation for AI agents

Kimi.ai has just rolled out Kimi Web Bridge, a Chrome extension it says lets agents browse like humans—searching, scrolling, clicking, and filling forms. It also touts support for tools like Claude Code, Cursor, and Codex, though early replies question reliability and security.

May 15, 2026

1 shared tag

DeepSeek slashes DeepSeek-V4-Pro API pricing 75% through 2026

DeepSeek has just rolled out a 75% discount on its DeepSeek-V4-Pro API through May 5, 2026. The company also teased agent-focused integration updates, including Claude Code support for a 1M-context model variant.

Apr 26, 2026

1 shared tag

DeepSeek open-sources V4 Preview with 1M context default

DeepSeek has just rolled out DeepSeek-V4 Preview, open-sourcing the model and making 1M context the default across its services. The launch includes V4-Pro and V4-Flash, plus new sparse attention and compression claims, fresh API pricing, and a planned 2026 retirement for older models.

Apr 24, 2026

1 shared tag

Continue the conversation on Slack

Related Articles

Kimi Web Bridge promises human-like browser automation for AI agents

DeepSeek slashes DeepSeek-V4-Pro API pricing 75% through 2026

DeepSeek open-sources V4 Preview with 1M context default