How is Codeex different from the Codeex CLI?

Codeex in the cloud (as discussed on Trebd) is more than just a hosted CLI. It focuses on the broader form factor, including UI integration, scalability, caching, permissioning, and collaboration, making it a more comprehensive agentic software development platform.

What are the key features of the Codeex model?

Codeex models are trained to adhere strictly to instructions, infer code style from the codebase, generate concise PR descriptions, and handle testing autonomously with clear reporting on passed or failed tests.

What are the best practices for using Codeex agents?

Key best practices include using `agents.md` for clear instructions, setting up linters and formatters, choosing modular code architecture, and adopting an abundance mindset by delegating tasks rather than micromanaging the agent's process.

How does Codeex manage large codebases and context windows?

While the exact mechanism is still evolving, Codeex agents are designed to learn to manage context window efficiency. They are trained to be resourceful and can learn to handle large codebases and `agents.md` files by inferring how to be efficient with token usage.

What is the typical task duration for Codeex?

The hard cutoff for tasks is currently one hour, though development has seen instances up to two hours. A 30-minute ballpark is considered good for complex tasks requiring iteration and testing, with averages being significantly lower.

What are the security measures in Codeex's compute platform?

Currently, internet access is cut off for running agents to ensure safety. While agents have passed initial security tests against exfiltration and prompt injection, OpenAI is conservatively limiting network access and evolving the platform based on learnings.

What is the vision for Codeex beyond coding?

The long-term vision is for Codeex to be part of a general AGI super assistant. The goal is to enable agents to handle most delegable work, freeing humans for ambiguous, creative, or hard-to-automate tasks, making AGI beneficial for all of humanity.

Key Moments

ChatGPT Codex: The Missing Manual

Latent Space Podcast

Science & Technology3 min read54 min video

May 16, 2025|9,809 views|189|12

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

OpenAI launched ChatGPT Codex, an autonomous software engineer agent, discussed by its core developers.

Key Insights

ChatGPT Codex is OpenAI's first cloud-hosted Autonomous Software Engineer (A-SWE).

The development of Codex evolved from giving reasoning models access to terminals to providing agents with their own cloud-based computers.

Codex aims to enhance developer productivity by acting as an independent software engineering agent, not just a code generator.

Key features include adherence to instructions, inferring code style, generating concise PR descriptions, and robust testing for changes.

Best practices for using Codex include utilizing an `agents.md` file for instructions, integrating linters/formatters, and ensuring code modularity and good architecture.

OpenAI is focusing on a 'one-shot' autonomous approach for Codex, contrasting with multi-shot human feedback models, with long-term goals of generalization and AGI integration.

THE ORIGIN STORY OF CHATGPT CODEX

The development of ChatGPT Codex stems from OpenAI's exploration of giving reasoning models access to tools, evolving from basic terminal access to creating sophisticated agents. Early experiments involved models modifying their own code, leading to the Codex CLI. The core idea then shifted to providing agents with their own cloud-based 'computers' to perform more complex, independent software engineering tasks safely and effectively, moving beyond simple code generation to full-cycle development.

FROM CLI TO CLOUD AGENT: THE EVOLUTION

The journey from the Codex CLI to the cloud-hosted Codex involved significant scope creep, driven by the realization that the product needed to be more than just a coding assistant. This expansion led to features like improved instruction adherence, inference of code style, better PR descriptions, and automated testing. The focus shifted towards creating an agent capable of genuine independent software engineering work, managing tasks over longer periods and processing information more comprehensively.

CORE FEATURES AND DEVELOPER EXPERIENCE

Codex is designed not just to write code, but to act as a full software engineer. Key features include superior instruction following, automatic code style inference, and the generation of concise, informative PR descriptions with cited code references. Its testing mechanism attempts to validate changes and clearly reports success or failure, even suggesting required installations like PNPM. This comprehensive approach aims to make integrating AI-generated changes seamless for human developers.

BEST PRACTICES FOR EFFECTIVE USE

To maximize Codex's utility, several best practices are recommended. These include creating an `agents.md` file to provide hierarchical instructions, integrating linters and formatters for in-loop verification, and ensuring codebases are modular and well-architected. The use of clear language in prompts, especially scoping instructions to specific directories, significantly aids the agent. Furthermore, adopting an 'abundance mindset'—sending tasks off without micro-managing—is encouraged for optimal workflow.

THE 'ONE-SHOT' APPROACH AND FUTURE VISION

OpenAI's philosophy for Codex leans towards a 'one-shot' autonomous approach, where the agent ideally completes a task independently, contrasting with models requiring continuous human feedback. This ambitious goal signifies a move towards AGI, where agents will handle most routine and complex tasks, freeing humans for more ambiguous or creative work. The long-term vision includes extending this agentic capability beyond coding to all functional areas, ultimately aiming for a ubiquitous AGI super-assistant.

COMPUTE PLATFORM, SAFETY, AND ITERATIVE DEPLOYMENT

The Codex compute platform is evolving, with a focus on providing agents with necessary computational resources while maintaining strict safety and security constraints. Currently, internet access is cut off during agent execution to mitigate risks, though limited access is a future consideration. OpenAI emphasizes iterative deployment, treating the research preview as a thought experiment to gather feedback on form factors, multimodal inputs, and environmental customization to refine the product towards a full release.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

Best Practices for Using Codeex Agents

Practical takeaways from this episode

Do This

Lean into the idea of agents for independent software engineering work.

Ensure your model adheres to instructions and infers code style.

Train the model to write concise PR descriptions and titles.

Utilize commit hooks for agents; they are beneficial for autonomous work.

Invest gradually in `agents.md` for clear agent instructions.

Implement basic linting and formatting for agents.

Make your codebase discoverable, especially for new agents.

Use modular and testable code architecture.

Consider language choice like TypeScript over JavaScript for agent development.

Give agents sufficient time (up to an hour) for complex tasks.

Adopt an abundance mindset; delegate tasks and don't over-craft prompts.

Try using Codeex on your phone to shift your perspective.

Allow agents access to necessary tools and environment setup.

Provide clear scoping guidance to agents (e.g., specify the subdirectory).

Allow models to learn and manage context window implicitly by giving them harder problems.

Avoid This

Do not let agents go loose on your local file system without proper sandboxing.

Do not assume agents will inherently know all software engineering practices without training or guidance.

Avoid overly complex `agents.md` files initially; start simple.

Do not hardcode all rules into prompts; train models to learn correct behavior.

Do not treat agents exactly like IDEs; they are meant for delegation and parallel work.

Do not restrict agents' access to tools or environments unnecessarily if safety allows.

Do not expect agents to immediately understand complex, poorly architected codebases.

Common Questions

ChatGPT Codeex is a new offering from OpenAI designed to act as an agent for autonomous software engineering. It allows models to access a computer environment, use tools, and perform complex coding tasks independently.

Topics

Ai Agents AI & Machine Learning Technology & Innovation Programming & Software Code Generation Prompt Engineering Autonomous Systems Developer Tools Software Development Automation

Mentioned in this video

Companies

Small AI

Co-host Wix is the founder of Small AI.

Airplane

Josh previously worked at Airplane, a company he founded that built an internal tool platform.

OpenAI

The company where the Codeex project was developed. The speakers discuss their experiences working there and the company's philosophy on AI development.

Software & Apps

Factory

Mentioned as an alternative AI coding agent that focuses on multi-shot human feedback, contrasting with Codeex's one-shot approach.

JavaScript

Questioned as a language choice for building agent products, with TypeScript recommended instead.

Python

Recommended as a programming language over Ruby for use with AI agents.

RSpec

Mentioned in an anecdote where the AI couldn't figure out how to run RSpec in Rails, resulting in it only checking Ruby syntax.

Ruby

Discouraged to use with AI agents, with Python suggested as a better alternative.

GPT-3.5

Mentioned in the context of Airplane's past experiments with AI for building React views.

Typescript

Suggested as a better alternative to JavaScript for building agent products, implying better type safety.

VS Code

Used as an analogy for how early project setup provides out-of-the-box checking, similar to what agents benefit from.

Devon

Mentioned as an alternative AI coding agent that focuses on multi-shot human feedback, contrasting with Codeex's one-shot approach.

ChatGPT

Mentioned in the context of early experiments with AI agents and pair programming.

Codeex CLI

The recently shipped command-line interface for Codeex, built on learnings from earlier experiments with giving reasoning models access to terminals.

Organizations

Decible

Partner and CTO of Decible are introduced at the beginning of the podcast.

Codeforces

The name of the project, 'wham', was chosen after checking its presence in the codebase to ensure efficient prompting for agents.

Concepts

Dev Containers

Mentioned as a potential form factor for environment customization in Codeex.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free