OpenAI’s Codex-Spark: 1000tps Model for Live Coding

Codex-Spark arrives as a research preview aimed at reducing latency in interactive coding workflows by pairing a compact, inference-optimized model with wafer-scale hardware. The rollout targets ChatGPT Pro users via the Codex app, Codex CLI, and a VS Code extension, with limited API access offered to select partners.

A model tuned for live coding

Codex-Spark positions responsiveness as a primary design goal alongside capability. It is presented as a small, inference-optimized model intended for tasks that benefit from rapid iteration: precise edits, plan revisions, contextual questions about a codebase, and quick UI or styling experiments. Benchmarks cited include improvements on SWE-Bench Pro and Terminal-Bench 2.0, where Codex-Spark reportedly outperforms GPT-5.1-Codex-mini while completing tasks more quickly.

Hardware and throughput

The defining infrastructure detail is the use of the Cerebras Wafer-Scale Engine, a processor family built for high-throughput inference. Codex-Spark is reported to run at over 1,000 tokens/s, enabling near-instant feedback in live coding environments. The Cerebras architecture emphasizes large on-chip memory and the ability to scale to thousands of systems, extending fast memory capacity into a multi-terabyte domain to support larger models for both training and inference.

Where it’s available now

Launch format: research preview.
Initial access: ChatGPT Pro users in the Codex app, Codex CLI, and the VS Code extension.
API access: being given to a small group of design partners for early experimentation.
Model capabilities: currently text-only with a 128k context window.

Infrastructure work and rollout plan

The deployment pairs model selection with infrastructure changes on the inference path: improvements to response streaming, faster session initialization, and rewrites of key inference components. These backend optimizations are planned to propagate across the Codex family over the coming weeks. The partnership frames Codex-Spark as a faster tier that complements GPU-backed production stacks for latency-sensitive workloads, with plans to extend ultra-fast inference to larger frontier models in 2026.

Limits and next steps

Codex-Spark is currently limited to text input and the announced context length; multimodal support, larger model variants, and longer context lengths are slated for later phases as the production deployment is evaluated. Access will expand gradually as additional capacity is brought online.

Original source: https://www.cerebras.ai/blog/openai-codexspark