Composer 1.5 Boosts Coding Reasoning with 20× Reinforcement Learning

Composer 1.5 scales reinforcement learning 20× to improve multi-step coding reasoning while preserving low latency. It adds 'thinking tokens' and self-summarization to tackle harder problems and continue work across context windows.

Composer 1.5 Boosts Coding Reasoning with 20× Reinforcement Learning

TL;DR

  • RL post-training scaled 20×: Composer 1.5 built by scaling reinforcement learning far beyond Composer 1 on the same pretrained base (post-training compute exceeds pretraining). https://cursor.com/blog/composer-1-5
  • Outperforms Composer 1 on a range of coding problems, with larger gains on harder tasks requiring deeper reasoning and iterative exploration
  • Generates “thinking tokens”: explicit intermediate reasoning tokens used to plan steps, kept minimal for simple tasks and extended for complex ones
  • Uses self-summarization during RL training to condense progress when context limits are reached, enabling recursive summarization for long-running problems
  • Positioned for interactive use and recommended over Composer 1 for most coding workflows; model access and pricing in the docs: https://cursor.com/docs/models

Composer 1.5 arrives as an iteration focused on balancing speed and deeper reasoning for code-related tasks. Built by scaling reinforcement learning 20× further on the same pretrained base model, the release emphasizes stronger problem-solving on real-world coding challenges while keeping interactive latency low. The post-training compute invested in Composer 1.5 even surpasses the amount used to pretrain the base model, signaling a substantial commitment to RL-based fine-tuning.

What changed under the hood

Composer 1.5’s improvements stem from extended RL post-training rather than a new base model. The result is a model that, according to internal benchmarks, outperforms Composer 1 on a range of coding problems and continues to climb in capability as tasks grow harder. The gains are most pronounced on challenging examples where deeper reasoning and iterative exploration matter.

Reasoning via "thinking tokens"

A defining behavior of Composer 1.5 is that it explicitly generates thinking tokens while processing queries. These tokens represent intermediate reasoning about the codebase and planned next steps. Training emphasized a balanced policy: produce minimal thinking for straightforward problems to stay fast and responsive, but allow extended thought when problems demand it. This trade-off aims to keep day-to-day interactions snappy while preserving the capacity for multi-step solutions.

Managing longer work with self-summarization

To handle tasks that exceed available context, Composer 1.5 incorporates self-summarization as part of RL training. When context runs out during training, the model practices producing useful summaries that condense progress and decisions; for difficult problems this summarization can trigger recursively. That capability helps the model continue exploring for solutions across context windows and preserves accuracy even as context length varies.

Practical notes

The release is positioned for interactive use, with Composer 1.5 recommended over its predecessor for most coding workflows. For pricing and model access details, refer to the official model documentation: https://cursor.com/docs/models.

Original source: https://cursor.com/blog/composer-1-5

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community