GPT-5.3-Codex: 25% Faster, Agentic Coding Model for Full Lifecycle

GPT-5.3-Codex is a 25% faster, agentic coding model that merges advanced code generation with reasoning to run long, tool-enabled engineering workflows. It's available in the Codex app, CLI, IDE extension and web, with API access coming soon.

GPT-5.3-Codex: 25% Faster, Agentic Coding Model for Full Lifecycle

TL;DR

  • Agentic coding assistant; 25% faster than GPT‑5.2‑Codex, combining GPT‑5.2‑Codex code strengths with GPT‑5.2 reasoning for long‑running, steerable workflows; used internally to instrument its own training and deployment
  • Strong benchmark gains: Terminal‑Bench 2.0 77.3% (prev 64.0%/62.2%); OSWorld‑Verified 64.7% (prev 38.2%/37.9%); Cybersecurity CTF 77.6% (prev 67.4%/67.7%); SWE‑Lancer IC Diamond 81.4%
  • Web/front‑end at scale: autonomously built two playable games (racing and diving) and produces more production‑ready defaults for routine web tasks
  • Broader product‑lifecycle collaborator: supports debugging, deployment, monitoring, PRDs, user‑research summarization, spreadsheets/slides, tests/metrics; Codex app provides progress updates and inline steering (Settings > General > Follow‑up behavior)
  • High capability for cybersecurity; trained to identify software vulnerabilities; mitigations include Trusted Access for Cyber pilot, expanded Aardvark private beta, partnerships for free codebase scanning (example: Next.js), and $10M in API credits via the Cybersecurity Grant Program
  • Availability and infra: launched early February 2026, available now to paid ChatGPT plans via Codex app, CLI, IDE extension, and web (API planned later); co‑designed, trained, and served on NVIDIA GB200 NVL72 systems

OpenAI today released GPT‑5.3‑Codex, an incremental update to its specialized programming model. This latest version merges the code-specific features of the 5.2-Codex branch with the general-purpose reasoning found in the standard GPT-5.2 model. According to technical specifications, the model performs approximately 25% faster than its predecessor.

What changed technically

GPT‑5.3‑Codex emphasizes sustained, agentic workflows. It was used internally to instrument and accelerate its own training, debugging, and deployment processes, and this was part of the model’s development path. The model’s architecture and training produced improvements across coding, frontend development, and desktop-level computer use, with a focus on maintaining context during long-running tasks and allowing real‑time steering while the agent executes.

Key performance highlights from the release include:

  • SWE‑Bench Pro: 56.8% (previous models: 56.4% GPT‑5.2‑Codex, 55.6% GPT‑5.2)
  • Terminal‑Bench 2.0: 77.3% (prev: 64.0%, 62.2%)
  • OSWorld‑Verified: 64.7% (prev: 38.2%, 37.9%)
  • GDPval (wins or ties): 70.9% (matches GPT‑5.2 high)
  • Cybersecurity CTF challenges: 77.6% (prev: 67.4%, 67.7%)
  • SWE‑Lancer IC Diamond: 81.4% (prev: 76.0%, 74.6%)

These figures underscore stronger terminal and computer‑use skills, substantial gains on real‑world agentic benchmarks, and notable improvements on cybersecurity challenge tasks.

Web and front-end work at scale

The model was tested on longer web development projects, including two games built and iterated autonomously: a racing game (with multiple racers, eight maps, items) and a diving game (exploration, resource management). Playable demos are provided in the announcement: the racing game and the diving game. For routine web tasks, GPT‑5.3‑Codex produces more production-ready defaults from underspecified prompts — for example, better handling of pricing toggles and richer testimonial components in landing pages.

Beyond code: a collaborator across the product lifecycle

GPT‑5.3‑Codex is presented as a tool for more than code generation. The model supports debugging, deployment, monitoring, PRD drafting, user research summarization, spreadsheet and slide generation, and tests/metrics work. It also demonstrated strong performance on GDPval tasks that simulate professional knowledge work across 44 occupations. The Codex app surfaces frequent progress updates and allows inline steering during execution; follow‑up behavior can be adjusted in Settings > General > Follow-up behavior.

How it helped build itself

Early iterations of GPT‑5.3‑Codex were used to speed up research and engineering tasks: monitoring training runs, diagnosing context rendering issues, proposing fixes, building diagnostic classifiers, scaling GPU clusters, and summarizing large experiment datasets. Those internal uses are cited as accelerants to development and deployment.

Cybersecurity posture and ecosystem support

GPT‑5.3‑Codex is classified as High capability for cybersecurity tasks under the Preparedness Framework and is specifically trained to identify software vulnerabilities. Alongside that classification, the release announces mitigations and programs intended to support defensive use:

  • A Trusted Access for Cyber pilot program.
  • Expansion of the private beta for Aardvark, the security research agent.
  • Partnerships with open‑source maintainers to provide free codebase scanning (example: Next.js disclosures).
  • A commitment of $10M in API credits for defensive security research, building on an earlier $1M grant program; applications are available through the Cybersecurity Grant Program.

Availability and infrastructure

GPT‑5.3‑Codex is available now to paid ChatGPT plans across the Codex app, CLI, IDE extension, and web. API access is planned for a later date. The rollout also notes that the model was co‑designed, trained, and served on NVIDIA GB200 NVL72 systems.

Original announcement: https://openai.com/index/introducing-gpt-5-3-codex/

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community