OpenAI today released GPT‑5.3‑Codex, an incremental update to its specialized programming model. This latest version merges the code-specific features of the 5.2-Codex branch with the general-purpose reasoning found in the standard GPT-5.2 model. According to technical specifications, the model performs approximately 25% faster than its predecessor.
What changed technically
GPT‑5.3‑Codex emphasizes sustained, agentic workflows. It was used internally to instrument and accelerate its own training, debugging, and deployment processes, and this was part of the model’s development path. The model’s architecture and training produced improvements across coding, frontend development, and desktop-level computer use, with a focus on maintaining context during long-running tasks and allowing real‑time steering while the agent executes.
Key performance highlights from the release include:
- SWE‑Bench Pro: 56.8% (previous models: 56.4% GPT‑5.2‑Codex, 55.6% GPT‑5.2)
- Terminal‑Bench 2.0: 77.3% (prev: 64.0%, 62.2%)
- OSWorld‑Verified: 64.7% (prev: 38.2%, 37.9%)
- GDPval (wins or ties): 70.9% (matches GPT‑5.2 high)
- Cybersecurity CTF challenges: 77.6% (prev: 67.4%, 67.7%)
- SWE‑Lancer IC Diamond: 81.4% (prev: 76.0%, 74.6%)
These figures underscore stronger terminal and computer‑use skills, substantial gains on real‑world agentic benchmarks, and notable improvements on cybersecurity challenge tasks.
Web and front-end work at scale
The model was tested on longer web development projects, including two games built and iterated autonomously: a racing game (with multiple racers, eight maps, items) and a diving game (exploration, resource management). Playable demos are provided in the announcement: the racing game and the diving game. For routine web tasks, GPT‑5.3‑Codex produces more production-ready defaults from underspecified prompts — for example, better handling of pricing toggles and richer testimonial components in landing pages.
Beyond code: a collaborator across the product lifecycle
GPT‑5.3‑Codex is presented as a tool for more than code generation. The model supports debugging, deployment, monitoring, PRD drafting, user research summarization, spreadsheet and slide generation, and tests/metrics work. It also demonstrated strong performance on GDPval tasks that simulate professional knowledge work across 44 occupations. The Codex app surfaces frequent progress updates and allows inline steering during execution; follow‑up behavior can be adjusted in Settings > General > Follow-up behavior.
How it helped build itself
Early iterations of GPT‑5.3‑Codex were used to speed up research and engineering tasks: monitoring training runs, diagnosing context rendering issues, proposing fixes, building diagnostic classifiers, scaling GPU clusters, and summarizing large experiment datasets. Those internal uses are cited as accelerants to development and deployment.
Cybersecurity posture and ecosystem support
GPT‑5.3‑Codex is classified as High capability for cybersecurity tasks under the Preparedness Framework and is specifically trained to identify software vulnerabilities. Alongside that classification, the release announces mitigations and programs intended to support defensive use:
- A Trusted Access for Cyber pilot program.
- Expansion of the private beta for Aardvark, the security research agent.
- Partnerships with open‑source maintainers to provide free codebase scanning (example: Next.js disclosures).
- A commitment of $10M in API credits for defensive security research, building on an earlier $1M grant program; applications are available through the Cybersecurity Grant Program.
Availability and infrastructure
GPT‑5.3‑Codex is available now to paid ChatGPT plans across the Codex app, CLI, IDE extension, and web. API access is planned for a later date. The rollout also notes that the model was co‑designed, trained, and served on NVIDIA GB200 NVL72 systems.
Original announcement: https://openai.com/index/introducing-gpt-5-3-codex/