San Francisco, Feb 5, 2026, 11:14 PST
- OpenAI rolled out GPT-5.3-Codex, claiming it delivers a 25% speed boost for Codex users
- OpenAI claims the model can tackle longer tasks combining coding, research, and tool use
- OpenAI is pitching Codex as more than just a code generator—it’s aiming to be a versatile “computer work” assistant.
OpenAI dropped GPT-5.3-Codex on Thursday, a fresh update to its Codex agent. The company claims it’s 25% faster and designed for complex tasks that mix research, tool use, and coding tweaks. It’s now live on paid ChatGPT plans, spanning the Codex app, command-line interface, IDE extension, and web. Benchmarks show solid jumps: 56.8% on SWE-Bench Pro and 77.3% on Terminal-Bench 2.0. OpenAI also revealed that early builds of the model helped with debugging training and handling deployment—calling this the first AI “instrumental in creating itself.” (OpenAI)
This update is significant since coding now serves as the testing ground for “agentic” AI—systems capable of planning and executing tasks, not merely finishing text. OpenAI CEO Sam Altman noted this week that “code plus generalized computer use is even much more powerful,” highlighting Codex, which over a million developers used just last month. (Reuters)
The move comes amid fierce competition with Anthropic and others, as investors keep an eye on whether AI agents will undercut traditional software firms. On Thursday, Anthropic rolled out what it described as an upgraded model, Claude Opus 4.6. The breakneck speed of these new launches has contributed to a selloff in certain software stocks, Reuters reported. (Reuters)
OpenAI’s angle isn’t just about generating code snippets—it’s about handling the entire cycle: build, test, fix, ship, then repeat. According to the company, SWE-Bench Pro simulates real software engineering tasks spanning several programming languages, while Terminal-Bench checks if an agent can effectively interact within a terminal environment.
OpenAI is extending Codex far beyond just engineering tasks. The new GPT-5.3-Codex aims to assist throughout the entire software lifecycle, from drafting product requirement documents to editing copy, as well as generating deliverables like slide decks and spreadsheet analyses.
The bigger capability brings a higher risk profile. OpenAI’s system card marks this as the first launch classified as “High capability” in cybersecurity under its Preparedness Framework. As a precaution, it’s rolling out safeguards since it can’t exclude the model hitting that threshold. Some advanced cyber-related functions will be closely monitored, and heavy users might have to verify their identity via a “Trusted Access for Cyber” program to keep access.
Despite improvements in AI agents, the handoff still feels clunky. Mike Krieger, head of Anthropic Labs, told a San Francisco summit this week that most users aren’t ready to let AI fully control their computers. This cautious stance is echoed quietly by numerous security and engineering teams.
On the infrastructure front, OpenAI revealed it co-designed, trained, and deploys GPT-5.3-Codex using Nvidia’s GB200 NVL72 systems. This highlights just how critical limited compute hardware and efficient infrastructure have become in the AI race.
Anthropic is pushing a similar angle on longer-term, tool-enabled models. In its announcement, it cited GitHub Chief Product Officer Mario Rodriguez, who said early tests revealed Claude Opus 4.6 managing “complex, multi-step coding work” along with “agentic workflows that demand planning and tool calling.” (Anthropic)
OpenAI revealed plans to roll out API access for GPT-5.3-Codex but didn’t specify when. At this stage, the real battleground is the tools developers use daily — apps, terminals, IDEs — and how securely these AI agents can be unleashed.