Composer 1.5 Boosts Coding Reasoning with 20× Reinforcement Learning

Composer 1.5 arrives as an iteration focused on balancing speed and deeper reasoning for code-related tasks. Built by scaling reinforcement learning 20× further on the same pretrained base model, the release emphasizes stronger problem-solving on real-world coding challenges while keeping interactive latency low. The post-training compute invested in Composer 1.5 even surpasses the amount used to pretrain the base model, signaling a substantial commitment to RL-based fine-tuning.

What changed under the hood

Composer 1.5’s improvements stem from extended RL post-training rather than a new base model. The result is a model that, according to internal benchmarks, outperforms Composer 1 on a range of coding problems and continues to climb in capability as tasks grow harder. The gains are most pronounced on challenging examples where deeper reasoning and iterative exploration matter.

Reasoning via "thinking tokens"

A defining behavior of Composer 1.5 is that it explicitly generates thinking tokens while processing queries. These tokens represent intermediate reasoning about the codebase and planned next steps. Training emphasized a balanced policy: produce minimal thinking for straightforward problems to stay fast and responsive, but allow extended thought when problems demand it. This trade-off aims to keep day-to-day interactions snappy while preserving the capacity for multi-step solutions.

Managing longer work with self-summarization

To handle tasks that exceed available context, Composer 1.5 incorporates self-summarization as part of RL training. When context runs out during training, the model practices producing useful summaries that condense progress and decisions; for difficult problems this summarization can trigger recursively. That capability helps the model continue exploring for solutions across context windows and preserves accuracy even as context length varies.

Practical notes

The release is positioned for interactive use, with Composer 1.5 recommended over its predecessor for most coding workflows. For pricing and model access details, refer to the official model documentation: https://cursor.com/docs/models.

Original source: https://cursor.com/blog/composer-1-5

Composer 1.5 Boosts Coding Reasoning with 20× Reinforcement Learning

TL;DR

What changed under the hood

Reasoning via "thinking tokens"

Managing longer work with self-summarization

Practical notes

Continue the conversation on Slack