How did Ryan Lopopolo build a million-line codebase in 5 months with zero lines of code written by hand?

Lopopolo used an AI agent, Codeex, with specific constraints and by breaking down complex tasks into smaller building blocks. This approach aimed to make the agent capable of performing all the tasks a human engineer would, leading to a significantly faster development cycle (claimed 10x).

What was the key constraint that drove the development of the 1 million-line codebase?

The primary constraint was not writing any code directly, forcing reliance on AI agents. This led to the development of 'assembly stations' and tools that enabled the agent to perform the entire job, making the process more productive than traditional engineering.

How do background shells in Codeex improve agent efficiency?

Background shells allow Codeex to spawn commands in the background and continue working on other tasks while waiting for them to finish. This makes the agent more time-efficient by enabling parallel operations, such as performing a long build while reviewing code.

Why is keeping build times under one minute important for AI agents?

A one-minute build time is a subjective but effective target for the 'inner loop' of development. It acts like a ratchet, forcing build discipline and preventing bloat. This speed allows agents to operate and iterate more quickly throughout the SDLC.

How does OpenAI Frontier aim to transform enterprises with AI?

Frontier is OpenAI's platform designed to make it easy to deploy highly observable, safe, and controllable AI agents into businesses. It integrates with existing company infrastructure like IAM, security tools, and workplace applications.

What is the role of 'skills' in the Codeex harness?

Skills are like fundamental building blocks or capabilities that agents can utilize. In Lopopolo's system, they are presented as small markdown files (e.g., Core beliefs.md, Tech tracker) that provide context and guardrails for the agent's actions and assessments.

How does the Symphony system approach software development?

Symphony uses a spec-driven approach where development is fundamentally defined by specifications. It leverages Elixir and its runtime (BEAM) for process orchestration, enabling agents to execute tasks, review them, and iterate based on feedback, moving towards autonomous development.

What are the current limitations of AI models discussed in the video?

Models are not yet capable of going from a completely new product idea to a playable prototype in a single shot. Complex refactorings and steering agents through 'whitespace' projects also remain challenging areas where human guidance is still crucial.

How does OpenAI's approach change the traditional view of software engineering?

The approach shifts focus from human legibility to agent legibility, embracing AI's design preferences. It moves towards a model where humans act more like tech leads overseeing a large organization, relying on representative samples of work rather than deep involvement in every PR.

What is the significance of 'self-modifying' agents?

Self-modifying agents have the capability to cut their own tickets, learn from their sessions, and suggest improvements to their own workflows or skills. This allows them to autonomously improve their performance and adapt to new requirements without constant human intervention.

How does OpenAI handle the expansion of its engineering presence to new offices like Seattle and New York?

OpenAI is establishing engineering hubs in locations like Seattle (Bellevue) and New York to access talent globally. These offices are designed with specific vibes to attract engineers, and they contribute to the company's 'zero to one' work in applied AI research and deployment.

Key Moments

Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier

Latent Space Podcast

Science & Technology6 min read78 min video

Apr 7, 2026|45 views|5|1

Save to Pod

Key Moments

On this page

TL;DR

AI agents can now build complex software products, generating over a million lines of code with minimal human input, but the key is not code quality, but how the agents are 'harnessed'.

Key Insights

OpenAI's Frontier team developed an internal tool over 5 months, generating over 1 million lines of code with zero human-written code, resulting in a 10x faster development cycle compared to traditional methods.

The system evolved through multiple GPT model iterations (5.1-5.4), requiring significant adaptation in build systems, moving from Makefiles to Bazel, Turbo, and Nx to meet sub-minute build time objectives.

Human involvement has shifted from direct code review to post-merge analysis, with synchronous human attention identified as the primary bottleneck, leading to systems designed for agent autonomy.

The 'harness engineering' approach emphasizes defining non-functional requirements (observability, reliability, documentation) as text-based inputs that agents can directly process and enforce.

Symphony, an Elixir-based framework, demonstrates a novel approach to distributing software and ideas as 'ghost libraries,' enabling agents to reproduce complex systems from specifications with high fidelity.

The Frontier platform aims to enable enterprises to deploy observable, safe, and controllable AI agents, integrating with existing company infrastructure and security tooling, with a focus on agent SDKs and customizable safety specs.

AI agents can now build complex software without human code

Ryan Lopopolo from OpenAI's Frontier team discusses the emergence of 'harness engineering,' a paradigm shift where AI agents, specifically through OpenAI's Codex, are used to build complex software products. His team developed an internal tool over five months with their primary constraint being not to write any code themselves. This approach resulted in a codebase exceeding one million lines, achieving a development speed 10 times faster than traditional methods. The core idea is to leverage the advanced coding capabilities of AI models by providing them with the necessary 'harness'—the framework and tools—to perform tasks, effectively collapsing user journeys and product requirements into code.

Adapting to evolving AI capabilities and build system demands

The development process was iterative, progressing through multiple GPT-4 model generations (5.1 to 5.4). Each model iteration presented unique quirks and working styles, necessitating continuous adaptation of the codebase. A significant challenge was managing build times, especially after the introduction of background shells in model 5.3, which reduced the model's patience for long-running blocking scripts. The solution involved rapidly iterating through build systems, including Makefile, Bazel, Turbo, and Nx, to ensure build completion under one minute. This was crucial for maintaining agent productivity, illustrating how the development environment must be as agile as the AI models it supports. The ability to afford rapid iteration is linked to the low cost of tokens and high parallelism of the models.

Shifting human roles to oversight and strategic direction

With AI agents handling the bulk of code generation, the role of human engineers has transformed. The primary bottleneck has shifted from direct code creation and review to synchronous human attention. Most code review now occurs post-merge, with human focus directed towards understanding where the agent makes mistakes and identifying areas for automation to prevent future time expenditure. This systemic shift requires a 'systems thinking mindset,' continuously evaluating agent performance and confidence in automation. For instance, the team invested heavily in providing agents with observability tools, such as traces and metrics, to ensure modularity, reliability, and code diagnosability, thereby reducing the need for constant human terminal supervision during development.

Defining 'skills' and 'scaffolding' for agent autonomy

A key aspect of this approach is creating explicit 'skills' and 'scaffolds' that guide the AI agents. Instead of pre-defining strict scaffolds for agents to operate within, the focus shifted to providing a flexible framework where the agent, as the 'harness,' can make intelligent choices based on context. This includes using short markdown files for specifications (e.g., `spec.md`, `agent.mmd`) and structured 'skills' like 'Core Beliefs.md' or 'Tech Tracker.md.' These act as hooks for the agent (Codex) to review business logic, assess it against defined guardrails, and propose follow-up work. This method makes it cheaper to inject new knowledge and instructions into the system, ensuring agents can adapt and enforce process knowledge, such as requiring timeouts for network calls and updating documentation accordingly.

Dynamic interaction and feedback loops for continuous improvement

The system incorporates dynamic feedback loops to refine agent behavior. Initially, code-writing agents were too easily 'bullied' by review agents, leading to convergence issues. To counter this, prompts were adjusted to allow agents to push back or defer feedback, mirroring how human engineers handle review comments. Review agents were also instructed to bias towards merging and limit surfacing critical issues. This flexibility is crucial because AI agents, by default, seek to follow instructions precisely. The process involves capturing instances where agents deviate from non-functional requirements—signaled by PR comments, failed builds, or misalignment with documentation—and funneling this information back into the system to improve future agent performance. This continuous 'gardening' of the codebase and agent behavior aims to maintain invariants and reduce code dispersion.

Symphony: Distributing software and automating complex system generation

Symphony, an Elixir-based framework, represents a significant advancement in automating complex system generation. It allows for the creation of 'ghost libraries' or specifications that agents can use to reproduce systems locally. The process involves agents analyzing existing code, generating a spec, and then using another agent to implement that spec. This loop continues with review agents ensuring fidelity to the original system. The choice of Elixir and the Erlang VM is due to their robust process supervision and gen-server capabilities, ideal for orchestrating numerous asynchronous tasks. This approach enables humans to focus on truly novel or complex 'hard and new' problems, trusting the agents to handle more established or easier tasks, whether mundane or complex.

OpenAI Frontier: Enterprise-grade AI deployment and management

OpenAI Frontier is positioned as an enterprise platform for deploying AI agents safely and at scale, offering a suite of tools for AI transformation. Key components include an Agents SDK for building custom agents, and a platform that integrates with native enterprise identity management, security tooling, and workspace applications. A central 'control dashboard' provides IT, GRC, and security teams with oversight into agent deployment, individual agent trajectories, and adherence to regulatory requirements. The platform emphasizes making complex agents easy to compose safely, with features like the GPT OSS safeguard model allowing for customizable safety specs to prevent data exfiltration and ensure compliance with specific company policies. The goal is to provide a robust, observable, and controllable environment for AI deployment.

The future of software engineering: Agents as teammates

The overarching theme is the collaborative potential between humans and AI agents, fostering a paradigm where agents act as teammates. This involves building trust through mechanisms like clear documentation, automated testing, and observable agent trajectories, similar to how a human teammate would present their work. The process of internalizing dependencies, exemplified by potentially in-housing libraries like DataDog or Temporal, reduces reliance on external plugins and simplifies the system. The efficiency gained allows human engineers to tackle the most challenging problems—those that are 'pure whitespace' or require deep refactoring—while agents handle the more structured or repetitive tasks. This shift not only enhances productivity but also fundamentally redefines the practice of software engineering by integrating AI deeply into the development lifecycle, enabling continuous self-improvement and adaptation.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Harness engineering involves the systems thinking and tooling necessary to deploy AI agents effectively. It's crucial because it allows for the collapse of complex user journeys into code, enabling AI models to handle the 'wiring' and focus on execution via prompts.

Topics

AI & Machine Learning Technology & Innovation Programming & Software AI Product Development Agent Development Future Of Coding Autonomous Agents Harness Engineering Software Development Workflow AI Orchestration

Mentioned in this video

Companies

Codeex

A harness used for building AI products, enabling communication through prompts to let models handle wiring and operation. It's integral to Ryan Lopopolo's approach to agent-driven development.

Snowflake

Mentioned as part of Ryan Lopopolo's background, indicating experience with enterprise customers.

Stripe

Mentioned as part of Ryan Lopopolo's background, indicating experience with enterprise customers.

GitHub

A platform for code hosting and collaboration, integrated into the development workflow with features like PRs and CLIs.

Bitbucket

Mentioned as a potential alternative to GitHub for code hosting, indicating the spec's adaptability.

Brex

Mentioned as an example of a company that would likely need OpenAI Frontier's enterprise AI solutions.

Temporal

A platform for orchestrating workflows and long-running processes, mentioned as a core inspiration for Symphony and its focus on process supervision and resumability.

Beam

The virtual machine for Elixir, praised for its concurrency model and features like resumability, valuable for agent orchestration.

DataDog

Mentioned as a service that is still paid for, even as dependencies are increasingly internalized.

OpenAI

The company where Ryan Lopopolo works, developing AI models and platforms like Frontier and Codeex.

Organizations

OpenAI Frontier

OpenAI's enterprise platform for safely deploying AI agents at scale with good governance, focused on packaging models into solutions for businesses.

Citadel

Mentioned as part of Ryan Lopopolo's background, indicating experience with enterprise customers.

Software & Apps

GPT-5

Referred to as 'GBD5' and iterations like 51, 52, 53, 54, indicating advancements in OpenAI's models used in the development process.

Turbo

A build system mentioned in the context of adapting the codebase for faster build times, alongside NX.

React

Mentioned in the context of front-end architecture and complexity, specifically within an Electron single-app setup.

Electron

A framework used for building the application, noted for its main and renderer processes and its capability for MVC-style decomposition.

Python

Mentioned as a 'tiny little bit of Python glue' used to spin up local development stacks.

agent.mmd

A file used to define agent configurations and behaviors, mentioned alongside spec.md.

Tech tracker

A markdown file or skill used to track and assess business logic against documented guardrails, proposing follow-up work for the agent.

Slack

Used as a communication channel where agents can be directed to perform tasks, such as updating documentation or fixing issues.

NPM

Mentioned in the context of packages within the repository's architecture.

Linear

The issue tracker used by the team, favored for its integration and ease of use.

GH CLI

The command-line interface for GitHub, used for interacting with repositories, creating pull requests, and viewing web UIs, noted for its token efficiency.

Prettier

A code formatter mentioned in the context of CLIs and how agents can interact with them, focusing on the outcome (formatted or not) rather than individual file formatting steps.

PNPM

A package manager mentioned in relation to its distributed script runner and the challenge of parsing large amounts of text from test suites.

Jira

Mentioned as a potential alternative to Linear for issue tracking, highlighting the flexibility of the spec to accommodate different tools.

ffmpeg

A command-line tool discussed for its extensive flags and potential for being turned into micro-SaaS products.

Spark

A faster AI model mentioned as being useful for quick changes, documentation updates, and transforming feedback into lints, though its application for high-level reasoning is still being explored.

ESLint

A linter mentioned in the context of adapting AI feedback into codebase infrastructure, specifically for transforming feedback into lints.

Lovable

A company mentioned alongside Bolt and Replit as solving the zero-to-one product idea problem with AI, distinct from coding agents.

Replit

Mentioned alongside Lovable and Bolt as a company addressing the zero-to-one product idea challenge with AI, differentiating from coding agents.

GPT OSS Safeguard

A model within OpenAI Frontier that interfaces with safety specs, allowing enterprises to instrument agents to prevent data exfiltration and manage internal company information.

Grafana

A dashboarding tool mentioned in the context of agents authoring JSON for dashboards and responding to alerts.

Typescript

Mentioned as an example of a language that leverages shared types to reduce complexity, similar to how Elixir's runtime features aid process orchestration.

GraphQL

Mentioned as a technology that previously enabled shared types across front-end and back-end, now superseded or complemented by other approaches.

Prometheus

An open-source monitoring and alerting toolkit mentioned as an example of a tool run locally to enable a full development loop.

Elixir

A programming language chosen for Symphony due to its process supervision and gen servers, which are well-suited for the type of process orchestration required.

GTPO

A model related to safety specifications for enterprises, allowing customization of agent behavior to avoid exfiltration and manage proprietary information.

T-Mox

A tool used in the Symphony process for managing disconnected code, implementing specs, and reviewing implementations.

Playwright

A web testing framework discussed in the context of integrating with the Electron app and the challenges of MCPs (injected context) that the agent might forget how to use.

Claude

Mentioned as a platform similar to Cursor that developers might use, where a similar level of review compression is expected.

Chatbot

Referred to as 'chat', used alongside specific models like 5.4 for tasks, and as a component within the broader AI workflow.

Symphony

A system developed for iterative spec-driven development, leveraging Elixir and BEAM for process orchestration, aiming to remove human context-switching.

Claude 3.5 Sonnet

Mentioned as a previous model iteration, contrasting with the capabilities of newer models like 5.4.

Products

Nintendo Switch

A build system mentioned as part of retooling for faster builds, alongside Turbo, to meet the under-one-minute requirement for agent operations.

Bolt

Mentioned as a company that solves the zero-to-one product idea problem using AI, distinct from software engineering agents.

Concepts

Quality score

A metric or assessment used by agents to evaluate business logic against guardrails, influencing proposed follow-up work.

A priority designation used by review agents, indicating that issues surfaced should not be greater than P2 to bias toward merging.

The highest priority level, indicating a critical issue that would 'nuke the codebase' if merged.

People

Linus Torvalds

Quoted regarding the principle of 'many eyes' making open source security robust, contrasted with the idea of internalizing dependencies.

Brett Taylor

Chairman at OpenAI, mentioned for his engagement with Ryan Lopopolo's articles and his views on the future of software dependencies.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free

Extreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier

Key Insights

AI agents can now build complex software without human code

Adapting to evolving AI capabilities and build system demands

Shifting human roles to oversight and strategic direction

Defining 'skills' and 'scaffolding' for agent autonomy

Dynamic interaction and feedback loops for continuous improvement

Symphony: Distributing software and automating complex system generation

OpenAI Frontier: Enterprise-grade AI deployment and management

The future of software engineering: Agents as teammates

Mentioned in This Episode

Common Questions

Topics

Mentioned in this video

More from Latent Space

Marc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning

The Stove Guy: Sam D'Amico Shows New AI Cooking Features on America's Most Powerful Stove at Impulse

Mistral: Voxtral TTS, Forge, Leanstral, & Mistral 4 — w/ Pavan Kumar Reddy & Guillaume Lample

Found this useful? Build your knowledge library