OpenAI’s GPT-5.3-Codex-Spark arrives as a research-preview model tuned for latency-sensitive coding workflows. Built in collaboration with Cerebras, Codex-Spark targets interactive development where responsiveness and steerability matter as much as raw capability.
What Codex-Spark is designed to do
Codex-Spark is described as a small, inference-optimized Codex model focused on real-time software development. The model emphasizes low-latency interaction for iterative coding tasks, enabling rapid feedback in live editing and prototyping sessions. Its strengths include precise code edits, plan revision, and contextual answers that reference the active codebase. The model is also framed as useful for quickly visualizing layout changes, refining styling, and testing UI adjustments.
Performance and benchmarks
A key operational claim is inference speed exceeding 1,000 tokens per second, made possible by deployment on Cerebras hardware. On agentic software-engineering suites such as SWE-Bench Pro and Terminal-Bench 2.0, Codex-Spark reportedly produces more capable responses than GPT-5.1-Codex-mini while completing tasks in a fraction of the time. That combination of capability and throughput is positioned to reduce waiting during extended agentic runs and interactive development sessions.
Cerebras hardware and scaling
Codex-Spark’s throughput is enabled by the Cerebras Wafer-Scale Engine. The hardware is highlighted for having the largest on-chip memory among AI processors, which supports high-speed inference at thousands of tokens per second per user. The architecture is described as scalable across many systems, extending fast memory capacity into the multi-terabyte range to accommodate larger models. The partnership indicates plans to extend ultra-fast inference to larger frontier models in 2026.
Integration and availability
The model is rolling out as a research preview for ChatGPT Pro users across the Codex app, CLI, and VS Code extension. API access is being offered to select design partners. Deployment on wafer-scale hardware is presented as a way to keep latency-sensitive Codex workflows responsive while preserving steerability during iterative development.
Original source: https://www.cerebras.ai/blog/openai-codexspark
