Insight
New benchmark on AI models code review value
Factory Research tested 13 AI models on 50 real pull requests to compare accuracy and cost. GPT-5.2 and Claude Opus 4.6 lead on quality, but pricing shifts the value story—especially for Kimi K2.5 and other budget picks.