Key Moments
Building AGI with OpenAI's Structured Outputs API
Key Moments
OpenAI's Michelle on Structured Outputs API: Enabling agents, use cases, and the future of AI development.
Key Insights
OpenAI's Structured Outputs API aims to simplify agent development by ensuring reliable, structured data exchange with models.
Structured Outputs is designed as a more purpose-built solution for structured responses compared to Function Calling, which is for tool execution.
The API combines engineering constraints with model training to improve adherence to formats and reduce errors like excessive whitespace.
A new 'refusal' field in the API allows models to decline harmful or policy-violating requests while maintaining a clear developer experience.
OpenAI is continuously working on improving model performance, reducing latency, and expanding the capabilities of its API platform.
The API's roadmap includes exploring custom grammars beyond JSON schema and enhancing features for AI agents and enterprise use.
THE EVOLUTION OF OPENAI'S API AND STRUCTURAL OUTPUTS
The discussion begins by tracing the speaker's career path through notable tech companies like Coinbase, Stripe, and ultimately OpenAI, highlighting experiences with scaling challenges and product development. Joining OpenAI before the ChatGPT launch, the speaker was drawn to products like GitHub Copilot. The narrative then shifts to the genesis of Structured Outputs, stemming from the introduction of JSON mode at Dev Day last year. JSON mode was an initial step to constrain model output to JSON, but it had limitations, often leading to developers wanting more precise control over keys and values, which then paved the way for the more robust Structured Outputs API.
STRUCTURED OUTPUTS VS. FUNCTION CALLING VS. JSON MODE
A key distinction is made between various API features. JSON mode is for basic JSON output constraints. Function Calling is specifically designed for enabling models to call external tools or functions, providing arguments for actual actions. Structured Outputs, however, is presented as a new response format optimized for getting the model to respond to a user in a structured way, distinct from tool invocation. While Function Calling has been adapted for structured responses, the new format is intended to provide more of the model's 'voice' and programmatic control for developers needing exact outputs for integration.
DESIGN PRINCIPLES AND TECHNICAL IMPLEMENTATION
The development of Structured Outputs involved both engineering and research. The engineering side focuses on constraining the model's output, for example, by limiting available tokens to fit a schema. The research aspect involves training the model to better understand and adhere to desired formats. This dual approach addresses issues like models outputting excessive whitespace, which can occur with purely engineering-driven constraints. The API aims to be developer-friendly, with SDK integrations allowing the use of Pydantic or Zod objects, abstracting away serialization complexities for a smoother user experience.
HANDLING ERRORS AND MAINTAINING SAFETY
A significant feature highlighted is the 'refusal' field within the API. This allows the model to refuse requests that might be harmful or violate policies, even when operating under a specific schema. This is crucial for safety and maintaining model integrity. The decision to use a refusal field instead of standard HTTP error codes is explained by the unique nature of AI errors, which don't always fit traditional Web 2.0 paradigms and involve model-specific behaviors. This provides a clearer developer experience for handling such refusals gracefully.
USE CASES AND THE FUTURE OF AGENTS
Structured Outputs is presented as a foundational building block for agentic applications, aiming to increase reliability in chained LLM calls from 95% to near 100%. Use cases include extracting structured data from unstructured text, dynamic UI generation using recursive schemas, and enabling more robust math tutoring systems by specifying step-by-step reasoning. The API's ability to handle nested structures makes it ideal for generating complex UIs. The speaker emphasizes that this feature is designed to make agentic workflows more stable and programmable, reducing the friction for developers building sophisticated AI applications.
MODEL SELECTION, FINE-TUNING, AND API ROADMAP
OpenAI offers various models, with recommendations to start with GPT-4 Turbo Mini for cost-effectiveness and scale up to GPT-4 Turbo for higher performance. The fine-tuning API is also highlighted as a powerful tool for achieving specific performance goals, with recent improvements making it more accessible. The roadmap includes exploring custom grammars beyond JSON schema, enhancing agent capabilities, and continuing to improve reliability and reduce latency. Features like parallel function calling, batch processing for cost savings, and advancements in Vision and Whisper APIs are also discussed as ongoing developments.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Books
●Concepts
Common Questions
Structured Outputs is an OpenAI API feature that constrains model responses to strictly adhere to a defined JSON schema. This drastically improves reliability for developers integrating LLMs into applications, ensuring outputs match expected formats and types, unlike previous methods like JSON mode.
Topics
Mentioned in this video
A Stripe Press book recommended by Michelle for its ability to inspire hard work and provide insight into building incredible things.
A book by Richard Thaler that Michelle frequently references, offering insights into irrational decision-making and cognitive biases.
OpenAI's cheapest and a recommended 'workhorse' model for many use cases, suitable to start with for cost-effectiveness.
A programming language mentioned in the context of potential future grammar support for structured outputs and available for notebook-based solutions.
OpenAI's API for processing visual inputs, usable across various OpenAI services and offering capabilities for data extraction from images.
A cost-effective API for bulk processing tasks with a 24-hour turnaround, ideal for user activation flows, evaluations, and other non-time-sensitive jobs.
A platform where Michelle discovered GitHub Copilot and later learned about Stripe and Coinbase, influencing her career path.
A company Michelle co-founded at the University of Waterloo as part of an entrepreneurship co-op program.
A product that excites many joining OpenAI, though Michelle's initial draw was GitHub Copilot. It also presented scaling challenges for the API due to tied accounts.
A TypeScript library for schema declaration and validation, mentioned as a way to use structured outputs easily by passing Zod objects.
An OpenAI API that offers hosted tools like file search and code interpreter, and supports statefulness for conversational AI applications.
A powerful model available through OpenAI's API, with specific versions like '40 Turbo' and '40-preview' offering different capabilities and tuning.
A company Michelle worked at as one of the first backend engineers, experiencing rapid growth and scaling challenges, including meltdowns during high-profile events.
A programming language Michelle learned during a rough early job at a bank, which she did not enjoy.
A version control system Michelle learned during her internship at Coinbase, highlighting her early engineering development.
A product that deeply impressed Michelle, leading her to join OpenAI, due to its high quality and transformative potential.
An OpenAI product showcased at a launch event that Michelle attended, which she found cool but less compelling for her career focus than Copilot.
The software development kit where the 'runs' beta feature for function calling was first implemented, allowing for closing the loop in conversations.
A project where constrained grammar mechanisms, like using Backus-Naur form, were first observed, influencing discussions around grammar in LLMs.
A tool offered through the Assistant API that allows models to execute code, simplifying complex tasks for developers.
An earlier generation of OpenAI models, with the discussion mentioning the nearing end of its 'run-rip' for certain applications.
An OpenAI model recommended for advanced use cases when GPT-4o mini doesn't meet performance needs.
OpenAI's speech-to-text model, with discussions covering its API limitations (lack of diarization) and potential future improvements.
A socket-based approach used by OpenAI for the ChatGPT app, suggesting a direction for real-time, interactive APIs like speech-to-speech.
A use case for structured outputs, leveraging recursive schemas to generate nested UI components, making frontend development easier for backend engineers.
An earlier OpenAI feature designed to give models access to tools, distinct from Structured Outputs which focuses on response formatting rather than tool execution.
The first foray into constraining model output to JSON, introduced at Dev Day, which was useful but had limitations like generating excessive whitespace.
A new OpenAI API feature designed to ensure model outputs strictly adhere to a defined schema, improving reliability and developer experience over JSON Mode.
A related concept to function calling, emphasizing the model's ability to interact with external tools.
One of the companies where Michelle interned, providing her with significant learning experiences.
A startup opportunity for Michelle where she learned intensely as an engineer, working on production databases and being on call.
The current company Michelle works at, focusing on API platform and models, particularly structured outputs.
A company where Michelle interned, described as a different flavor of payments and inspiring her.
A team at OpenAI that collaborated on the structured outputs feature, ensuring the model could still refuse harmful requests within schema constraints.
An organization that collaborates with OpenAI on function calling leaderboards, used internally for evaluating model performance.
Michelle's alma mater, known for its co-op program and strong startup culture, which fostered her career development.
More from Latent Space
View all 167 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free