12/18/2025 | News release | Distributed by Public on 12/18/2025 12:29
The most advanced agentic coding model for professional software engineering and defensive cybersecurity.
Today we're releasing GPT-5.2-Codex, the most advanced agentic coding model yet for complex, real-world software engineering. GPT-5.2-Codex is a version of GPT-5.2 further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities.
As our models continue to advance along the intelligence frontier, we've observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity . For example, just last week, a security researcher using GPT-5.1-Codex-Max with Codex CLI found and responsibly disclosed (opens in a new window)a vulnerability in React that could lead to source code exposure.
GPT-5.2-Codex has stronger cybersecurity capabilities than any model we've released so far. These advances can help strengthen cybersecurity at scale, but they also raise new dual-use risks that require careful deployment. While GPT-5.2-Codex does not reach a 'High' level of cyber capability under our Preparedness Framework, we're designing our deployment approach with future capability growth in mind.
We're releasing GPT-5.2-Codex today in all Codex surfaces for paid ChatGPT users, and working towards safely enabling access to GPT-5.2-Codex for API users in the coming weeks. In parallel, we're piloting invite-only trusted access to upcoming capabilities and more permissive models for vetted professionals and organizations focused on defensive cybersecurity work. We believe that this approach to deployment will balance accessibility with safety.
GPT-5.2-Codex builds on GPT-5.2's strengths in professional knowledge work and GPT-5.1-Codex-Max 's frontier agentic coding and terminal-using capabilities. GPT-5.2-Codex is now better at long-context understanding, reliable tool calling, improved factuality, and native compaction, making it a more dependable partner for long running coding tasks, while remaining token-efficient in its reasoning.
GPT-5.2-Codex achieves state-of-the-art performance on SWE-Bench Pro and Terminal-Bench 2.0, benchmarks designed to test agentic performance on a wide variety of tasks in realistic terminal environments. It is also much more effective and reliable at agentic coding in native Windows environments, building on capabilities introduced in GPT-5.1-Codex-Max.
With these improvements, Codex is more capable at working in large repositories over extended sessions with full context intact. It can more reliably complete complex tasks like large refactors, code migrations, and feature builds - continuing to iterate without losing track, even when plans change or attempts fail.
InSWE-Bench Pro , a model is given a code repository and must generate a patch to solve a realistic software engineering task. Terminal-Bench 2.0is a benchmark for testing AI agents in real terminal environments. Tasks include compiling code, training models and setting up servers.
Stronger vision performance enables GPT-5.2-Codex to more accurately interpret screenshots, technical diagrams, charts, and UI surfaces shared during coding sessions.
Codex can take design mocks and quickly translate them to functional prototypes, and you can pair with Codex to take these prototypes to production.
When charting performance on one of our core cybersecurity evaluations over time, we see a sharp jump in capability starting with GPT-5-Codex, another large jump with GPT-5.1-Codex-Max and now a third jump with GPT-5.2-Codex. We expect that upcoming AI models will continue on this trajectory. In preparation, we are planning and evaluating as though each new model could reach 'High' levels of cybersecurity capability, as measured by our Preparedness Framework (opens in a new window). While GPT-5.2-Codex has not yet reached 'High' level of cyber capability, we are preparing for future models that cross that threshold. Due to the increased cyber capabilities, we have added additional safeguards at the in model and in the product, which are outlined in the system card .
The Professional Capture-the-Flag (CTF)eval measures how often the model can solve advanced, multi-step real-world challenges (requiring professional-level cybersecurity skills) in a Linux environment.
Modern society runs on software, and its reliability depends on strong cybersecurity-keeping critical systems in banking, healthcare, communications, and essential services online, protecting sensitive data, and ensuring people can trust the software they rely on every day. Vulnerabilities can exist long before anyone knows about them, and finding, validating, and fixing them often depends on a community of engineers and independent security researchers equipped with the right tools.
On December 11, 2025, the React team published three security vulnerabilities affecting apps built with React Server Components. What made this disclosure notable was not only the vulnerabilities themselves, but how they were uncovered.
Andrew MacPherson, a principal security engineer at Privy (a Stripe company), was using GPT-5.1-Codex-Max with Codex CLI and other coding agents to reproduce and study a different critical React vulnerability disclosed the week prior, known as React2Shell (opens in a new window)(CVE-2025-55182 (opens in a new window)). His goal was to evaluate how well the model could assist with real-world vulnerability research.
He initially attempted several zero-shot analyses, prompting the model to examine the patch and identify the vulnerability it addressed. When that did not yield results, he shifted to a higher-volume, iterative prompting approach. When those approaches did not succeed, he guided Codex through standard defensive security workflows-setting up a local test environment, reasoning through potential attack surfaces, and using fuzzing to probe the system with malformed inputs. While attempting to reproduce the original React2Shell issue, Codex surfaced unexpected behaviors that warranted deeper investigation. Over the course of a single week, this process led to the discovery of previously unknown vulnerabilities, which were responsibly disclosed to the React team.
This demonstrates how advanced AI systems can materially accelerate defensive security work in widely used, real-world software. At the same time, capabilities that help defenders move faster can also be misused by bad actors.
As agentic systems become more capable in cybersecurity-relevant tasks, we are making it a core priority to ensure these advances are deployed responsibly-pairing every gain in capability with stronger safeguards, tighter access controls, and ongoing collaboration with the security community.
When charting performance on one of our core cybersecurity evaluations over time, we see a sharp jump in capability starting with GPT-5-Codex, another large jump with GPT-5.1-Codex-Max and now a third jump with GPT-5.2-Codex. We expect that upcoming AI models will continue on this trajectory. In preparation, we are planning and evaluating as though each new model could reach 'High' levels of cybersecurity capability, as measured by our Preparedness Framework (opens in a new window). While GPT-5.2-Codex has not yet reached 'High' level of cyber capability, we are preparing for future models that cross that threshold. Due to the increased cyber capabilities, we have added additional safeguards at the model layer and in the product, which are outlined in the system card .
Security teams can run into restrictions when attempting to emulate threat actors, analyze malware to support remediation, or stress test critical infrastructure. We are developing a trusted access pilot to remove that friction for qualifying users and organizations and enable trusted defenders to use frontier AI cyber capabilities to accelerate cyberdefense.
Initially the pilot program will be invite-only for vetted security professionals with a track record of responsible vulnerability disclosure and organizations with a clear professional cybersecurity use case. Qualifying participants will get access to our most capable models for defensive use-cases to enable legitimate dual-use work.
If you're a security professional or part of an organization doing ethical security work like vulnerability research or authorized red-teaming, we invite you to express interest in joining and share feedback on what you'd like to see from the program here (opens in a new window).
GPT-5.2-Codex represents a step forward in how advanced AI can support real-world software engineering and cybersecurity-helping developers and defenders tackle complex, long-horizon work, and strengthening the tools available for responsible security research.
By rolling GPT-5.2-Codex out gradually, pairing deployment with safeguards, and working closely with the security community, we're aiming to maximize defensive impact while reducing the risk of misuse. What we learn from this release will directly inform how we expand access over time, as both the software and cyber frontiers continue to advance.