Kilo Blog has published a detailed comparison of Claude Opus 4.8 and MiniMax M3, testing both models on the same code audit and tracking how they performed on cost, speed, and issue detection. The setup is intentionally narrow and practical: one codebase, one prompt, and a side-by-side look at how each model handled a real review task.
The results appear to show a familiar tradeoff. MiniMax M3 delivered a surprisingly strong audit for a far lower bill, while Claude Opus 4.8 generally pushed further as its reasoning level increased. The write-up suggests that the better-performing Claude settings also came with a much steeper cost, and that extra spending did not always translate into proportionally better output.
What makes the piece interesting is how it compares multiple Claude configurations against a single MiniMax run rather than treating model choice as a simple headline metric. That approach lets the authors show where the higher-end model "wins," where it falters, and where a cheaper alternative keeps up better than expected. The full article goes into the methodology, the findings, and how the models scaled under the same workload.
There is also a useful discussion of timing and efficiency, with the article noting that wall-clock time tracked token usage more closely than raw model name. For teams weighing AI-assisted code review or other repeated auditing tasks, the comparison offers a concrete way to think about coverage versus spend.
The full breakdown, including the audit setup and the per-run results, is available in the original post on Kilo Blog.
Source: Kilo Blog

