
OpenAI launched a macOS app for its Codex coding tool on Monday, incorporating agentic practices that enable multiple agents to operate in parallel while integrating advanced workflows, less than two months after releasing GPT-5.2-Codex, its most powerful coding model.
Artificial intelligence systems now handle substantial portions of software development tasks previously done manually, with swarms of agents and subagents performing the routine aspects of programming. Developers continue to test various interfaces and formats for human-AI collaboration, creating challenges for AI research organizations to match evolving practices. Agentic software development, where AI agents operate autonomously on coding assignments, represents the prevailing approach, as demonstrated by applications such as Claude Code and Cowork.
OpenAI first introduced Codex as a command-line interface tool in April, followed by a web-based version one month afterward. The macOS application marks the company’s effort to align with these agentic advancements.
The app supports simultaneous operation of multiple agents, combining their individual capabilities with contemporary workflow methods. OpenAI positions this release to attract users from competing tools like Claude Code.
GPT-5.2-Codex serves as OpenAI’s leading coding model. During a press call, CEO Sam Altman stated, “If you really want to do sophisticated work on something complex, 5.2 is the strongest model by far.” He added, “However, it’s been harder to use, so taking that level of model capability and putting it in a more flexible interface, we think is going to matter quite a bit.”
Coding performance evaluations present a mixed picture for GPT-5.2. The model achieves the highest score on TerminalBench, which assesses AI performance on command-line programming tasks. Scores from Gemini 3 and Claude Opus trail slightly but fall within the benchmark’s margin of error. On SWE-bench, which evaluates AI ability to resolve actual software bugs encountered in production environments, outcomes show no definitive superiority for GPT-5.2 over competitors.
Benchmarking agentic applications proves challenging due to their complexity, and performance in practical scenarios differs markedly among leading models based on individual user interactions.
The Codex macOS app includes several capabilities aimed at matching or surpassing features in Claude-based applications. Users can configure automations to execute in the background according to preset schedules, with outputs stored in a queue for later examination upon the user’s return. Additionally, the app offers selectable agent personalities tailored to diverse preferences, spanning from pragmatic approaches focused on efficiency to empathetic styles that prioritize user alignment.
OpenAI emphasizes the application’s capacity to accelerate development processes significantly. Altman described this aspect, noting, “You can use this from a clean sheet of paper, brand new, to make a really quite sophisticated piece of software in a few hours.” He continued, “As fast as I can type in new ideas, that is the limit of what can get built.”
These elements position the Codex macOS app within the competitive landscape of agentic coding tools, building on OpenAI’s incremental expansions of the Codex platform since its initial command-line debut.