Kilo Blog’s latest benchmark write-up looks at DeepSeek V4 Pro and DeepSeek V4 Flash alongside Claude Opus 4.7 and Kimi K2.6, using the same heavier FlowGraph setup the publication used in an earlier comparison. The post paints DeepSeek’s new open-weight pair as relevant contenders, with Pro landing in the middle of the pack and Flash aiming at the ultra-low-cost end of the market.
According to the blog, V4 Pro came in with a stronger showing than Kimi K2.6, while still trailing Claude Opus 4.7 on the tougher parts of the workflow. The write-up also notes that DeepSeek’s temporary pricing promotion and lower cache pricing make the model more attractive on cost than its list price alone might suggest.
V4 Flash, meanwhile, appears to be a different kind of product altogether. The post describes it as dramatically cheaper than the other models in the comparison, but also more uneven in how it handled the workflow. Even so, the author suggests its agent/tool-calling behavior held up better than expected for such a low-price run.
The rest of the article focuses on where each model stumbled in a complex backend-building task, especially around scheduling, recovery, validation, and build integrity. Rather than treating benchmark scores as the whole story, the post argues that the gap between open-weight and proprietary models is narrowing in broad coverage, while the hardest coordination-heavy edge cases remain the real separator.
Readers interested in the detailed failure cases, the cost-per-point breakdown, and the side-by-side scoring table can find the full post here: Kilo Blog.
Source: Kilo Blog