GPT-5.3 Codex-Spark: OpenAI and Cerebras Enable Real-Time Coding

OpenAI's GPT-5.3 Codex-Spark is a latency-optimized Codex model for real-time coding, hitting 1,000+ tokens/sec on Cerebras wafer-scale hardware. Rolling out as a research preview to ChatGPT Pro (Codex app, CLI, VS Code), it enables fast, steerable edits and prototyping.

openai cover

TL;DR

  • GPT-5.3-Codex-Spark: Research-preview, small inference-optimized Codex model for latency-sensitive coding workflows
  • Designed for real-time development: Rapid feedback, precise code edits, plan revision, codebase-contextual answers, UI/layout tweaks
  • Throughput claim: >1,000 tokens/second via Cerebras deployment for responsive interactive sessions
  • Benchmarks: Stronger than GPT-5.1-Codex-mini on SWE-Bench Pro and Terminal-Bench 2.0, faster task completion
  • Cerebras scaling: Wafer-Scale Engine with largest on-chip memory; scales to multi-terabyte fast memory across systems
  • Availability: Rolling out to ChatGPT Pro via Codex app, CLI, VS Code extension; API for select design partners

OpenAI’s GPT-5.3-Codex-Spark arrives as a research-preview model tuned for latency-sensitive coding workflows. Built in collaboration with Cerebras, Codex-Spark targets interactive development where responsiveness and steerability matter as much as raw capability.

What Codex-Spark is designed to do

Codex-Spark is described as a small, inference-optimized Codex model focused on real-time software development. The model emphasizes low-latency interaction for iterative coding tasks, enabling rapid feedback in live editing and prototyping sessions. Its strengths include precise code edits, plan revision, and contextual answers that reference the active codebase. The model is also framed as useful for quickly visualizing layout changes, refining styling, and testing UI adjustments.

Performance and benchmarks

A key operational claim is inference speed exceeding 1,000 tokens per second, made possible by deployment on Cerebras hardware. On agentic software-engineering suites such as SWE-Bench Pro and Terminal-Bench 2.0, Codex-Spark reportedly produces more capable responses than GPT-5.1-Codex-mini while completing tasks in a fraction of the time. That combination of capability and throughput is positioned to reduce waiting during extended agentic runs and interactive development sessions.

Cerebras hardware and scaling

Codex-Spark’s throughput is enabled by the Cerebras Wafer-Scale Engine. The hardware is highlighted for having the largest on-chip memory among AI processors, which supports high-speed inference at thousands of tokens per second per user. The architecture is described as scalable across many systems, extending fast memory capacity into the multi-terabyte range to accommodate larger models. The partnership indicates plans to extend ultra-fast inference to larger frontier models in 2026.

Integration and availability

The model is rolling out as a research preview for ChatGPT Pro users across the Codex app, CLI, and VS Code extension. API access is being offered to select design partners. Deployment on wafer-scale hardware is presented as a way to keep latency-sensitive Codex workflows responsive while preserving steerability during iterative development.

Original source: https://www.cerebras.ai/blog/openai-codexspark

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community