OpenAI has taken the unusually clear step of publishing an in depth technical breakdown of how its Codex CLI coding agent works beneath the hood. This submit, written by OpenAI engineer Michael Bolin, supplies the clearest demonstration but of how production-grade AI brokers coordinate large-scale language fashions, instruments, and consumer enter to carry out real-world software program growth duties.
On the core of Codex is what OpenAI calls the agent loop. That is an iterative cycle that alternates between mannequin inference and power execution. Every cycle begins when Codex constructs a immediate from structured enter comparable to system directions, developer constraints, consumer messages, environmental context, and accessible instruments, and sends it to OpenAI’s Response API for inference.
The mannequin output could be in considered one of two codecs. It might generate an help message for the consumer or request a software invocation, comparable to executing a shell command, studying a file, or invoking a planning or search utility. When a software name is requested, Codex executes it domestically (inside the outlined sandbox limits), provides the outcomes to the immediate, and queries the mannequin once more. This loop continues till the mannequin points a last assistant message that indicators the tip of the conversational flip.
Though this high-level sample is frequent to many AI brokers, OpenAI’s documentation stands out for its specificity. Bolin explains how prompts are assembled merchandise by merchandise, how roles (system, developer, consumer, assistant) decide priorities, and the way small design selections just like the order of instruments in an inventory can have a big effect on efficiency.
Some of the notable architectural choices is Codex’s fully stateless interplay mannequin. Codex resends your complete dialog historical past with every request, fairly than counting on server-side dialog reminiscence by way of the non-compulsory Previous_response_id parameter. This method simplifies infrastructure and permits zero information retention (ZDR) for patrons who require strict privateness ensures.
The drawbacks are apparent. Every interplay will increase the scale of the immediate and quadratically will increase the information despatched. OpenAI mitigates this via aggressive immediate caching. This permits your mannequin to reuse calculations so long as the brand new immediate is an actual prefix extension of the earlier immediate. When caching works, inference prices enhance linearly fairly than quadratically.
Nonetheless, this constraint imposes extreme self-discipline on the system. Altering instruments, switching fashions, altering sandbox permissions, or reordering software definitions throughout a dialog may cause cache misses and considerably degrade efficiency. Bolin notes that early help for the Mannequin Context Protocol (MCP) software uncovered precisely the sort of vulnerability, requiring the staff to rigorously redesign how the software handles dynamic updates.
Fast development additionally conflicts with one other extreme limitation: the mannequin’s context window. Each enter and output tokens depend in opposition to this restrict, so long-running brokers that carry out a whole bunch of software calls threat working out of accessible context.
To deal with this, Codex employs automated dialog compression. When the variety of tokens exceeds a configurable threshold, Codex replaces the total dialog historical past with a condensed illustration generated by way of a particular response/compact API endpoint. Importantly, this compressed context incorporates an encrypted payload that shops the mannequin’s latent understanding of earlier interactions, permitting it to proceed making constant inferences with out accessing the total uncooked historical past.
Earlier variations of Codex required customers to manually set off compaction. This course of is now automated and virtually invisible. This is a crucial usability enchancment as brokers tackle longer and extra complicated duties.
OpenAI has traditionally been reluctant to launch detailed technical particulars about flagship merchandise like ChatGPT. Nonetheless, the codex is dealt with in another way. The result’s a uncommon and frank account of the trade-offs concerned in constructing real-world AI brokers, together with efficiency and privateness, flexibility and cache effectivity, and autonomy and security. Bolin does not shrink back from describing bugs, inefficiencies, or hard-learned classes, reinforcing the message that at the moment’s AI brokers are highly effective however removed from magic.
This submit serves as a blueprint for anybody constructing brokers on high of the trendy LLM API, past Codex itself. Right here, we spotlight rising finest practices which can be shortly changing into business requirements, comparable to stateless design, prefix stability prompts, and express context administration.


