How does Structured Outputs differ from Function Calling?

Function Calling is designed for models to call external functions or tools, providing arguments for specific actions. Structured Outputs, on the other hand, focuses on enforcing a specific response format for user-facing communication, ensuring data consistency and predictability.

What are the main benefits of using Structured Outputs for developers?

Structured Outputs simplifies integration by eliminating the need for manual parsing and validation of LLM responses. It guarantees outputs fit the specified schema, reducing errors and retries, and leads to significant cost and latency savings.

Can OpenAI's models refuse requests even when using Structured Outputs?

Yes, OpenAI has implemented a 'refusal field' within the Structured Outputs feature. This allows the model to refuse requests that violate safety policies or are harmful, while still providing a clear signal to developers, distinct from a standard API error.

Are there plans to support custom grammars beyond JSON schema for Structured Outputs?

OpenAI is exploring the possibility of supporting custom grammars beyond JSON schema in the future. They are interested in developer feedback regarding what specific grammars, like those for programming languages (e.g., Python), would be most valuable.

What is the recommended model selection strategy for OpenAI's API?

OpenAI suggests starting with GPT-4o mini for its cost-effectiveness and performance on many tasks. If higher performance is needed, upgrade to GPT-4o. For cutting-edge use cases, fine-tuning the API is the next step.

How does OpenAI differentiate its API models from the models used in ChatGPT?

API models are specifically tuned for developer use cases like function calling and structured outputs, with stable weights for predictable performance. ChatGPT models are optimized for chat-style interactions and are often rolling releases, reflecting the latest research.

What are the key updates to the Assistant API since its initial release?

Recent improvements include a dramatic increase in file support (from 20 to 10K per assistant) and enhanced semantic search capabilities. OpenAI is also working on making the stateful aspects of the API more intuitive.

What should developers know about OpenAI's Batch API?

The Batch API offers a significant cost saving (half off) by processing jobs with a 24-hour turnaround time. It's ideal for large-scale tasks like user activation flows, evaluations, and non-real-time processing.

Does the Whisper API support speaker diarization?

Currently, the Whisper API does not offer speaker diarization. While Whisper V3 (open-sourced) includes this feature, OpenAI has prioritized other advancements due to performance trade-offs compared to Whisper V2.

What are key characteristics of engineers who thrive at OpenAI?

Engineers who do well at OpenAI are typically low-ego, user-focused, driven, and willing to roll up their sleeves. They embrace a 'build something people want' ethos and are unpretentious in their approach.

Key Moments

Building AGI with OpenAI's Structured Outputs API

Latent Space Podcast

Science & Technology4 min read73 min video

Sep 17, 2024|2,144 views|79|7

ai security ai benchmarks nicholas carlini

Save to Pod

Key Moments

TL;DR

OpenAI's Michelle on Structured Outputs API: Enabling agents, use cases, and the future of AI development.

Key Insights

OpenAI's Structured Outputs API aims to simplify agent development by ensuring reliable, structured data exchange with models.

Structured Outputs is designed as a more purpose-built solution for structured responses compared to Function Calling, which is for tool execution.

The API combines engineering constraints with model training to improve adherence to formats and reduce errors like excessive whitespace.

A new 'refusal' field in the API allows models to decline harmful or policy-violating requests while maintaining a clear developer experience.

OpenAI is continuously working on improving model performance, reducing latency, and expanding the capabilities of its API platform.

The API's roadmap includes exploring custom grammars beyond JSON schema and enhancing features for AI agents and enterprise use.

THE EVOLUTION OF OPENAI'S API AND STRUCTURAL OUTPUTS

The discussion begins by tracing the speaker's career path through notable tech companies like Coinbase, Stripe, and ultimately OpenAI, highlighting experiences with scaling challenges and product development. Joining OpenAI before the ChatGPT launch, the speaker was drawn to products like GitHub Copilot. The narrative then shifts to the genesis of Structured Outputs, stemming from the introduction of JSON mode at Dev Day last year. JSON mode was an initial step to constrain model output to JSON, but it had limitations, often leading to developers wanting more precise control over keys and values, which then paved the way for the more robust Structured Outputs API.

STRUCTURED OUTPUTS VS. FUNCTION CALLING VS. JSON MODE

A key distinction is made between various API features. JSON mode is for basic JSON output constraints. Function Calling is specifically designed for enabling models to call external tools or functions, providing arguments for actual actions. Structured Outputs, however, is presented as a new response format optimized for getting the model to respond to a user in a structured way, distinct from tool invocation. While Function Calling has been adapted for structured responses, the new format is intended to provide more of the model's 'voice' and programmatic control for developers needing exact outputs for integration.

DESIGN PRINCIPLES AND TECHNICAL IMPLEMENTATION

The development of Structured Outputs involved both engineering and research. The engineering side focuses on constraining the model's output, for example, by limiting available tokens to fit a schema. The research aspect involves training the model to better understand and adhere to desired formats. This dual approach addresses issues like models outputting excessive whitespace, which can occur with purely engineering-driven constraints. The API aims to be developer-friendly, with SDK integrations allowing the use of Pydantic or Zod objects, abstracting away serialization complexities for a smoother user experience.

HANDLING ERRORS AND MAINTAINING SAFETY

A significant feature highlighted is the 'refusal' field within the API. This allows the model to refuse requests that might be harmful or violate policies, even when operating under a specific schema. This is crucial for safety and maintaining model integrity. The decision to use a refusal field instead of standard HTTP error codes is explained by the unique nature of AI errors, which don't always fit traditional Web 2.0 paradigms and involve model-specific behaviors. This provides a clearer developer experience for handling such refusals gracefully.

USE CASES AND THE FUTURE OF AGENTS

Structured Outputs is presented as a foundational building block for agentic applications, aiming to increase reliability in chained LLM calls from 95% to near 100%. Use cases include extracting structured data from unstructured text, dynamic UI generation using recursive schemas, and enabling more robust math tutoring systems by specifying step-by-step reasoning. The API's ability to handle nested structures makes it ideal for generating complex UIs. The speaker emphasizes that this feature is designed to make agentic workflows more stable and programmable, reducing the friction for developers building sophisticated AI applications.

MODEL SELECTION, FINE-TUNING, AND API ROADMAP

OpenAI offers various models, with recommendations to start with GPT-4 Turbo Mini for cost-effectiveness and scale up to GPT-4 Turbo for higher performance. The fine-tuning API is also highlighted as a powerful tool for achieving specific performance goals, with recent improvements making it more accessible. The roadmap includes exploring custom grammars beyond JSON schema, enhancing agent capabilities, and continuing to improve reliability and reduce latency. Features like parallel function calling, batch processing for cost savings, and advancements in Vision and Whisper APIs are also discussed as ongoing developments.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Books

●Concepts

Common Questions

Structured Outputs is an OpenAI API feature that constrains model responses to strictly adhere to a defined JSON schema. This drastically improves reliability for developers integrating LLMs into applications, ensuring outputs match expected formats and types, unlike previous methods like JSON mode.

Topics

AI & Machine Learning Technology & Innovation Programming & Software Structured Data Developer Experience Agentic Systems API Design LLM APIs AI Model Development

Mentioned in this video

Books

The Making of Prince of Persia

A Stripe Press book recommended by Michelle for its ability to inspire hard work and provide insight into building incredible things.

Misbehaving: The Making of Behavioral Economics

A book by Richard Thaler that Michelle frequently references, offering insights into irrational decision-making and cognitive biases.

Software & Apps

GPT-4o mini

OpenAI's cheapest and a recommended 'workhorse' model for many use cases, suitable to start with for cost-effectiveness.

Python

A programming language mentioned in the context of potential future grammar support for structured outputs and available for notebook-based solutions.

Vision API

OpenAI's API for processing visual inputs, usable across various OpenAI services and offering capabilities for data extraction from images.

Batch API

A cost-effective API for bulk processing tasks with a 24-hour turnaround, ideal for user activation flows, evaluations, and other non-time-sensitive jobs.

Hacker News

A platform where Michelle discovered GitHub Copilot and later learned about Stripe and Coinbase, influencing her career path.

Readwise

A company Michelle co-founded at the University of Waterloo as part of an entrepreneurship co-op program.

ChatGPT

A product that excites many joining OpenAI, though Michelle's initial draw was GitHub Copilot. It also presented scaling challenges for the API due to tied accounts.

Zod

A TypeScript library for schema declaration and validation, mentioned as a way to use structured outputs easily by passing Zod objects.

Assistant API

An OpenAI API that offers hosted tools like file search and code interpreter, and supports statefulness for conversational AI applications.

GPT-4 Turbo

A powerful model available through OpenAI's API, with specific versions like '40 Turbo' and '40-preview' offering different capabilities and tuning.

Clubhouse

A company Michelle worked at as one of the first backend engineers, experiencing rapid growth and scaling challenges, including meltdowns during high-profile events.

Visual Basic

A programming language Michelle learned during a rough early job at a bank, which she did not enjoy.

Git

A version control system Michelle learned during her internship at Coinbase, highlighting her early engineering development.

DALL-E

An OpenAI product showcased at a launch event that Michelle attended, which she found cool but less compelling for her career focus than Copilot.

Node.js SDK

The software development kit where the 'runs' beta feature for function calling was first implemented, allowing for closing the loop in conversations.

LLaMA CPP

A project where constrained grammar mechanisms, like using Backus-Naur form, were first observed, influencing discussions around grammar in LLMs.

Code Interpreter

A tool offered through the Assistant API that allows models to execute code, simplifying complex tasks for developers.

GPT-3.5

An earlier generation of OpenAI models, with the discussion mentioning the nearing end of its 'run-rip' for certain applications.

40 Turbo

An OpenAI model recommended for advanced use cases when GPT-4o mini doesn't meet performance needs.

Whisper

OpenAI's speech-to-text model, with discussions covering its API limitations (lack of diarization) and potential future improvements.

LifeKit

A socket-based approach used by OpenAI for the ChatGPT app, suggesting a direction for real-time, interactive APIs like speech-to-speech.

Concepts

Dynamic UI Generation

A use case for structured outputs, leveraging recursive schemas to generate nested UI components, making frontend development easier for backend engineers.

Function Calling

An earlier OpenAI feature designed to give models access to tools, distinct from Structured Outputs which focuses on response formatting rather than tool execution.

JSON Mode

The first foray into constraining model output to JSON, introduced at Dev Day, which was useful but had limitations like generating excessive whitespace.

Structured Outputs

A new OpenAI API feature designed to ensure model outputs strictly adhere to a defined schema, improving reliability and developer experience over JSON Mode.

tool calling

A related concept to function calling, emphasizing the model's ability to interact with external tools.

Companies

Google

One of the companies where Michelle interned, providing her with significant learning experiences.

Coinbase

A startup opportunity for Michelle where she learned intensely as an engineer, working on production databases and being on call.

OpenAI

The current company Michelle works at, focusing on API platform and models, particularly structured outputs.

Stripe

A company where Michelle interned, described as a different flavor of payments and inspiring her.

GitHub

A product that deeply impressed Michelle, leading her to join OpenAI, due to its high quality and transformative potential.

Studies & Research

S-bench

An evaluation benchmark for LLMs that targets code writing and file manipulation capabilities, noted for its low pass rate and relevance to model assessment.

Organizations

Safety Systems

A team at OpenAI that collaborated on the structured outputs feature, ensuring the model could still refuse harmful requests within schema constraints.

BFCL

An organization that collaborates with OpenAI on function calling leaderboards, used internally for evaluating model performance.

University of Waterloo

Michelle's alma mater, known for its co-op program and strong startup culture, which fostered her career development.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free