⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI
Key Moments
OpenAI's Codex and GPT-5 training teams discuss AI agents, personality, tools, and trust in coding.
Key Insights
Codex Max offers extended runtimes (24+ hours) and improved speed/maximization with enhanced tool utilization.
Agent 'personality' training focuses on communication, planning, and self-correction for developer trust.
Codex models are optimized for OpenAI's harness and are open-source, while mainline GPT-5 models are more general.
Agentic behavior is shifting towards higher abstraction layers, with agents building upon agents (sub-agents).
Evaluation (eval) is crucial for building trust and improving AI agents, moving from academic to applied use cases.
Coding agents are evolving into general computer interaction tools, extending beyond code generation to personal automation.
INTRODUCING CODEX MAX AND ITS CAPABILITIES
The discussion begins with the introduction of Codex Max, highlighting its significant advancements over previous models. A key feature is its extended runtime, capable of operating for 24 hours or more, which is a substantial leap for long-running tasks. The name 'Max' signifies not only this sustained operation but also maximization in terms of speed and efficiency for problem-solving. This contrasts with 'Pro,' which might imply slower, more methodical execution. Codex Max aims to deliver superior performance, reaching correct answers faster for similar types of problems, underscoring a focus on both endurance and optimal performance.
TRAINING AGENTS FOR TRUST AND EFFECTIVENESS
A significant focus in training AI models, particularly for coding assistance, is developing 'personality' to foster trust with developers. This involves instilling behavioral characteristics like clear communication about ongoing processes, strategic planning before execution, and self-checking of work. These software engineering best practices translate into measurable agent behaviors. Close collaboration with leading coding partners provides insights into specific needs, guiding the training towards addressing particular 'particularities' and ensuring the AI functions as a reliable pair programmer. This human-like approach to agent interaction is essential for user adoption and maximizing the utility of AI tools.
DISTINGUISHING CODEX MODELS FROM MAINSTREAM MODELS
A crucial distinction is made between the Codex line of models and OpenAI's mainline models. Codex is specifically designed and optimized for OpenAI's coding harness, creating a dedicated coding agent. Its open-source nature and availability via API make it a focused tool for developers building within this framework. In contrast, mainline models like GPT-5 are more general-purpose, capable of handling a broader range of tasks beyond coding. While they possess coding capabilities, their general nature allows for greater steerability with diverse tools. However, this generality can lead to slower performance or more errors when encountering unfamiliar tools, making Codex the preferred choice for bleeding-edge coding applications.
THE RISE OF AGENTS AND SUB-AGENTS
The conversation highlights a significant trend: the abstraction layer is moving upwards from the model to the agent level. Instead of just optimizing individual models, the focus is shifting towards packaging entire agents, allowing developers to build on top of them. This pattern enables integration of agents like Codex into platforms without needing to manage every model release or API change. Coding is presented as a prime example of agentic behavior, but the concept extends to agents that can spawn other agents (sub-agents) to perform tasks in parallel or hand off context. This leads to more complex systems where agents can, for instance, generate custom plugins for software, making applications self-customizable.
THE CRITICAL ROLE OF EVALUATION AND TRUST-BUILDING
Building trust in AI agents is paramount, and this relies heavily on robust evaluation (eval) processes. OpenAI is developing platform tooling for agent traces, rollout traces, and grading systems to monitor and improve agent behavior. This involves moving beyond academic evaluations to applied ones that capture real-world use cases. The analogy of hiring a PhD student is used: the AI needs a job description (prompt), mentorship, guardrails, and performance reviews (enals) to improve. This iterative process, where customer feedback fuels model training, is key to ensuring AI capabilities align with user needs and deliver maximum useful impact.
EXTENDING CODING AGENTS TO GENERAL COMPUTER INTERACTION
A compelling perspective shared is that coding agents are not just for code; they are evolving into general computer interaction agents, particularly for terminal-based tasks. They can be seen as a bridge from traditional command-line interactions to more intelligent, agent-driven operations. This generality allows them to automate personal tasks, such as organizing files, managing desktops, or even sorting through emails. The idea is that if an agent can write code to solve a problem, it can also string together commands or scripts within a terminal to achieve the same goal, effectively becoming a universal assistant for interacting with computers.
FUTURE VISIONS AND OPERATIONALIZING AI CAPABILITIES
Looking ahead to 2026, the vision includes more advanced computer use driven by AI, especially for applications lacking robust APIs, leveraging UI interaction. The extensibility of coding agents to perform more general tasks and the ability to build with sub-agents are anticipated. A primary goal is to elevate the trust level in these AI capabilities, allowing developers at all levels to tackle complex refactoring, implement new technologies, and generally perform better. This democratization of high-tier development capabilities aims to equip every team with intelligence previously only accessible at top-tier firms, enabling more efficient and sophisticated problem-solving.
Mentioned in This Episode
●Tools & Products
Common Questions
Codex is specifically optimized for coding tasks and designed to work seamlessly within its harness and API. Mainline models, like GPT-5, are more general-purpose and offer broader steerability to different types of tools, making them more adaptable but potentially slower for highly specialized coding tasks.
Topics
Mentioned in this video
A command-line tool that Codex performs particularly well with when named correctly, suggesting a dependency on tool naming conventions.
An AI agent, likely for coding, which the speaker has a personal project to adapt for non-coding tasks.
An academic benchmark used for evaluating AI models, particularly in multi-turn scenarios.
More from Latent Space
View all 13 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free