How is OpenAI changing its organizational structure?

OpenAI is transitioning from a nonprofit to a for-profit structure. This shift has raised questions about increased funding for research versus its commitment to ensuring AI benefits everyone, coinciding with some leadership departures like the Chief Research Officer, VP of Research, and CTO.

What is GPT-01 and how is it different from GPT-4?

GPT-01 is a new OpenAI model that uses reinforcement learning to reason and learn from mistakes, excelling in advanced math and complex coding problems in ways GPT-4 struggles with. While more powerful for reasoning, it requires more computational power and time.

What is OpenAI's current stance on Artificial General Intelligence (AGI)?

Sam Altman suggests moving beyond the term AGI due to its buzzword status. OpenAI is focusing on continuously improving AI capabilities rather than specific AGI definitions. He anticipates significant, rapid progress towards systems that can perform complex cognitive tasks.

Are there legal restrictions on using AI for phone calls?

Yes, there are recent regulations around using AI for phone calls, similar to robocalling. Consent from the person being called is crucial. Developers are advised to tread carefully, as the legal landscape is still a 'gray area,' especially regarding individuals versus businesses.

How does OpenAI prioritize developer feedback for new features?

OpenAI's roadmap is heavily influenced by developer feedback, from high-level feature requests to intricate API details. They aim to build what developers need, encouraging them to create applications that are slightly beyond current model capabilities to guide future AI advancements.

What are the challenges with using WebSockets for real-time AI APIs?

While WebSockets are a natural fit for bidirectional streaming in real-time APIs, challenges include security concerns (API keys in client-side code), lack of widespread developer experience with real-time programming, and ensuring robust client-side handling without direct socket interaction.

How does OpenAI plan to support long audio inputs for models?

OpenAI is rolling out the ability to input URLs to MP3 files or other audio recordings directly into the chat completion API, eliminating the need to slice up long audio files and stream them through WebSockets. This aims to simplify multimodal input.

What is OpenAI's approach to AI safety and alignment?

OpenAI emphasizes iterative deployment, focusing on making capable models safer over time by confronting real-world problems as they emerge. They start with conservative safety measures for new technologies and gradually relax them as society adapts and real harms are understood, while also considering long-term Sci-Fi risks.

What is the future vision for AI agents?

Sam Altman and Kevin Will predict that 2025 will be a significant year for AI agents. These agents will be able to handle multi-turn interactions, perform complex tasks, and act on problems with the cognitive effort of multiple smart humans over days, drastically changing how humans interact with computers and solve problems.

How does OpenAI balance building for current user needs versus future AI capabilities?

OpenAI aims to build for features that AI models 'can just barely not do,' anticipating future capabilities to stay at the frontier. However, they also must address current user needs, fix bugs, and onboard new users to existing products, balancing rapid innovation with usability and broad adoption.

Will OpenAI release models for offline usage?

Sam Altman states that OpenAI is open to sharing models for offline usage, but it is not a high priority on the current roadmap due to resource constraints. He acknowledges the reasons for local models but indicates it's not a 'this year kind of thing.'

What are OpenAI's thoughts on open-source AI?

OpenAI views open source as 'awesome' and would like to contribute more, but prioritization is a challenge. They feel the 'on-device model' segment is well-served by existing open-source models and would only contribute something truly unique rather than small benchmark improvements. Spiritually, they support its existence.

What is the long-term vision for AI engagement and form factors?

The vision is a completely new way of using computers: interacting with a piece of glass, speaking naturally, and having incredible reasoning models, agents, and video models dynamically render custom interfaces in real-time based on requests. This will enable complex tasks that currently take humans years to be done in moments.

Key Moments

Building AGI in Real Time (OpenAI Dev Day 2024)

Q: What are new features like Automatic Prompt Caching and Model Distillation?

Automatic Prompt Caching gives developers a discount when the AI sees a repeated prompt. Model Distillation creates smaller, more efficient versions of powerful AI models, making them lighter-weight, more affordable, and more accessible without significant performance loss.

Latent Space Podcast

Science & Technology3 min read130 min video

Oct 4, 2024|4,555 views|67|2

latent space openai sam altman kevin weils devday ai engineering

Save to Pod

Key Moments

TL;DR

OpenAI DevDay 2024 unveils Realtime API, Vision Finetuning, Prompt Caching, '01' model, and company structure shifts.

Key Insights

OpenAI introduced a Realtime API using WebSockets and function calling for natural, instantaneous AI interactions.

Vision Finetuning allows custom AI models trained on image data, with applications in fields like medicine.

Prompt Caching offers discounts for repeated prompts, making API usage more affordable.

The new '01' model represents a leap in reasoning capabilities, excelling at complex math and coding problems.

OpenAI is transitioning towards a for-profit structure, with notable departures in leadership coincident with this shift.

Model Distillation creates smaller, efficient AI models from larger ones, increasing accessibility.

OpenAI is focused on responsible AI development, safety, and ethical considerations across all new releases.

REAL-TIME API: ELEVATING CONVERSATIONAL AI

OpenAI's DevDay 2024 prominently featured the new Realtime API, designed to enable more natural and instantaneous AI interactions. By leveraging persistent WebSocket connections and function calling, this API allows for seamless, human-like conversations, including interruptions and multi-turn dialogues. Demos showcased practical applications like a travel agent AI ordering food and a language learning app, highlighting its potential to revolutionize how users engage with AI systems. The underlying technology focuses on bridging the gap to human-level latency, making AI communication feel fluid and responsive.

VISION FINETUNING AND MODEL ADVANCEMENTS

A significant announcement was Vision Finetuning, enabling developers to customize AI models with their own image data. This capability opens doors for specialized applications, particularly in areas like medical diagnostics, where AI can be trained to identify subtle patterns in medical images. Beyond vision, OpenAI also introduced advanced tools like Prompt Caching, which offers cost savings by discounting repeated prompts, and Model Distillation, a technique to create smaller, more efficient versions of powerful AI models. These advancements aim to make sophisticated AI more accessible and affordable for a wider range of users and developers.

THE '01' MODEL: A NEW ERA OF REASONING

The introduction of the '01' model marks a substantial step forward, moving beyond simply scaling existing models. OpenAI describes '01' as a model capable of true reasoning, trained through reinforcement learning to learn from mistakes and solve problems more effectively. Demos illustrated its prowess in advanced mathematics and complex coding tasks, areas where previous models like GPT-4 sometimes struggled. While requiring more computational resources, '01' represents a new frontier in AI intelligence, with future iterations promising enhanced system prompts and structured outputs for even more sophisticated applications.

COMPANY SHIFTS AND STRATEGIC DIRECTION

DevDay 2024 also brought news of significant internal changes at OpenAI, including a move towards a for-profit structure. This shift has sparked considerable discussion, with concurrent departures of key leadership figures like the Chief Research Officer and CTO. While the financial implications for research funding are anticipated, questions linger about potential impacts on OpenAI's commitment to ensuring AI benefits everyone. The company emphasized a continued focus on safety and responsible development amidst these structural transformations.

EMPOWERING DEVELOPERS THROUGH NEW TOOLS

OpenAI's strategy at DevDay clearly focused on equipping developers with enhanced tools and capabilities. Beyond the core model and API announcements, features like automatic prompt caching and model distillation aim to streamline development and reduce costs. The emphasis on creating a 'pit of success' for fine-tuning, particularly with vision models, signals a commitment to making complex AI customization more approachable. This developer-centric approach is crucial for fostering innovation and enabling the creation of diverse, real-world AI applications.

THE FUTURE OF AI INTERACTION AND ETHICS

The discussions at DevDay, including the closing Q&A with CEO Sam Altman, touched upon the evolving nature of human-AI interaction. The move towards more natural interfaces, like voice and potentially video, alongside increasingly capable agents, redefines computing. OpenAI stressed its commitment to ethical development and safety, acknowledging the potential for misuse while striving for responsible innovation. The company's iterative deployment strategy and focus on learning from real-world usage underscore their approach to navigating the complex landscape of advanced AI and its societal impact.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Studies Cited

●Concepts

●People Referenced

Model Performance Comparison: Distillation from GPT-4o to Mini

Data extracted from this episode

Model	Performance Hit	Cost Reduction
Distilled GPT-4o to 4 Mini	2%	15x cheaper

Common Questions

OpenAI's Real-time API allows for natural, instantaneous voice interactions with AI using persistent WebSocket connections. It employs function calling to access external tools and information, enabling dynamic responses and demonstrations like travel planning or ordering takeout.

Topics

Ai-Ethics Ai Agents AI & Machine Learning Technology & Innovation Business & Entrepreneurship AI Development Real-time AI AI Models Developer Platforms

Mentioned in this video

Concepts

fine-tuning

A process of customizing a pre-trained AI model on specific data to tailor it for a particular task or company's communication style and brand guidelines, now extended to include vision.

Model Distillation

A technique to create a smaller, more efficient version of a large, powerful AI model while retaining similar capabilities, making AI more accessible and runnable on less powerful hardware.

Swift UI

A UI framework for Apple platforms that, alongside Swift, has simplified iOS development, allowing AI models to assist in architecting and generating code.

Function Calling

A capability that allows AI models to access and utilize external tools and information, enabling them to pull up details from databases or interact with other services in real-time demos.

WebSocket

A communication protocol used by the Real-time API to establish persistent, bidirectional connections, allowing AI to respond instantly and handle interruptions in conversations.

Vision Fine-tuning

Applying fine-tuning techniques to image-based AI models, with applications in medicine for diagnosis and improving accuracy in tasks like object detection (bounding boxes) on bespoke formats.

artificial general intelligence

A term Sam Altman believes has become a buzzword, suggesting a shift in focus to continuous AI improvement rather than defining a specific AGI threshold.

People

Olivier Goldmont

Head of Product for the OpenAI platform, who led the Dev Day keynote, introduced new features, and discussed the global outreach strategy for developers and design choices behind new APIs.

Sam Altman

The CEO of OpenAI, who discussed the evolving understanding of AGI, emphasized responsible AI development, and participated in a Q&A session.

Elon Biegio

An OpenAI Developer Experience Engineer and 'strawberry shop owner' in a demo, who showcased the Real-time API's voice mode and function calling by role-playing a strawberry vendor for a phone order.

Michelle Pocas

A member of the OpenAI API team who discussed structured outputs and the voice mode API, sharing insights into the design decisions for the new websocket API.

Kevin Will

OpenAI's Chief Product Officer, who joined Sam Altman for a discussion on the future of AI and the big picture for the company.

Roman Huge

OpenAI's Head of Developer Experience and AI Engineer, who showcased live demos of the International Space Station tracker and a drone, and offered advice on consuming coding models and real-time APIs.

Simon Willison

A super blogger and returning guest co-host who live-blogged Dev Day, praised the real-time API's potential for web apps, and shared his own AI experiments and feature requests.

Ali Pullin

Co-founder of Co-op Labs, AI engineer, and World's Fair closing keynote speaker, who talked about fine-tuning GPT-4o to achieve high scores on S-bench and his techniques for generating human-like reasoning traces.

Software & Apps

AWS

Used as an analogy to describe OpenAI's expanding role beyond a model provider to an 'AI Cloud,' offering comprehensive services like storage and compute.

GPT-01 Mini

A smaller version of GPT-01, noted for its excellent performance in math, coding, and STEM subjects, making it suitable for specific, rooted-in-code tasks.

VS Code

A code editor that Michelle Pocas still uses with Copilot, despite trying out Cursor, highlighting it as her tool of choice.

Whisper

An AI model used for transcribing audio, specifically mentioned in the context of processing hour-long YouTube videos before multimodal capabilities were available.

GPT-2

Mentioned as a point of reference for the exponential growth in AI capabilities, suggesting that GPT-01 represents a similar 'scale moment' in AI development.

Devin

A software engineer agent from Cognition mentioned as utilizing 01 Preview in its own software, similar to how Cursor integrates OpenAI models.

Wonderlust

A travel app originally developed by Simon and Gis, later modified by Roman Huge to incorporate voice components and real-time calling abilities for the Dev Day demo.

Vim

A text editor that was controlled with voice mode in an internal hackathon project, demonstrating new ways of interacting with code.

E2B

A company mentioned as offering 'code interpreter as a service,' providing sandboxed environments for running and compiling code.

GPT-4

A previous OpenAI language model, compared to GPT-01 which surpasses it in advanced math and complex coding, but GPT-4 is still suitable for tasks like screenplay writing.

Cursor

A coding tool mentioned as a preferred environment for interacting with OpenAI's coding models, particularly GPT-01 Preview and Mini.

Code Interpreter

An API mentioned as a specific use case for the Assistants API, praised for its ability to run and compile code within a sandboxed environment.

NotebookLM

Google's notebook product, admired by Sam Altman for its cool format and ability to generate podcast-style voices, allowing users to create dynamic content from their documents.

Speak

An application doing 'cool things' with language translation, showcasing the practical applications of AI models in real-world scenarios.

Hacker News

An online forum where a user highlighted a detail about the Real-time API providing a text version of spoken content for storage and analysis.

Automatic Prompt Caching

An OpenAI feature that offers discounts when the AI sees a prompt it has processed before, making AI usage more affordable and efficient without requiring code changes from developers.

Twilio API

The API used in Elon Biegio's demo to make phone calls with AI agents, integrating voice mode and function calling.

International Space Station tracker

One of two applications showcased by Roman Huge on stage, demonstrating the capabilities of OpenAI's models in real-time.

Swift

A programming language that, along with Swift UI, has made iOS development easier, especially when combined with AI coding partners like GPT-01.

LiveKit

A partner solution mentioned for helping developers integrate with the real-time API, offering native plugging capabilities for various client-side and server-side voice interactions.

Google Gemini

A competitor's AI model praised for its bounding box capabilities and its ability to process long video inputs by slicing them into individual frames.

Genie

Co-op Labs' model, capable of software engineering tasks, which was developed using specific fine-tuning techniques and data pipelines to generate human-like reasoning traces, outperforming GPT-01 out of the box on S-bench.

React

A JavaScript library mentioned in the context of Genie's performance in UI development, where evaluating the model's output could be challenging without Vision fine-tuning.

Real-time API

A new API announced by OpenAI designed for real-time interaction with AI, using persistent WebSocket connections and function calling to enable more natural, interruptible voice conversations.

GPT-01

OpenAI's new model, described as a significant leap forward, trained with reinforcement learning to reason and learn from mistakes, excelling in advanced math and complex coding problems.

ChatGPT

Mentioned as an example of an existing voice mode, with potential future extensions for real-time video and image capabilities.

Xcode

An IDE mentioned as a development environment for iOS apps, where GPT-01 and ChatGPT are used as coding and brainstorming partners due to its less deep integration.

WebRTC

A client-side technology suggested for developers who want very robust real-time communication, potentially as an alternative to directly working with WebSockets at scale.

Open Router

A platform described as the 'Metamask for AI,' allowing users to bring their own API keys and securely manage their AI usage across different models.

GPT-4o

A model that Co-op Labs successfully fine-tuned to achieve higher scores than GPT-01 on S-bench, demonstrating that older models can be enhanced through custom reasoning.

Companies

Mentioned as an example of how people globally prefer voice notes for communication, highlighting the natural human preference for spoken interaction over typing.

Case Text

A company doing 'cool things' with coding, demonstrating advancements in AI applications for legal tech.

Twitter

Platform where Sam Altman saw examples of NotebookLM and its functionalities, and where his philosophy of iterative deployment was cultivated during his time there.

Anthropic

A rival AI company mentioned regarding its prompt caching duration (5 minutes) and 'projects' feature, which is admired by Kevin Will.

Co-op Labs

Ali Pullin's company, distinguished for its work in fine-tuning GPT-4o with synthetic datasets and for obscuring Chain of Thought traces as intellectual property.

Waymo

An autonomous driving company, whose self-driving car experience was used as an analogy to describe how quickly humans adapt to new AI capabilities, initially amazed then quickly accustomed.

GitHub

An AI coding assistant that was the primary reason Michelle Pocas joined OpenAI, indicating its significant impact on her work.

Products

drone

The second application demonstrated by Roman Huge, highlighting the coding models' ability to architect and wire up specific functionalities like frontend, backend, UDP, and WebSockets.

Locations

Agora

A partner solution like LiveKit, offering ways for developers to get started with OpenAI's real-time API, depending on specific use cases.

Minnesota

One of the US states that has partnered with OpenAI, demonstrating initial government adoption of AI tools.

Pennsylvania

Another US state mentioned as partnering with OpenAI, reflecting the growing collaboration between AI companies and government agencies.

Singapore

One of the international locations, along with San Francisco and London, where OpenAI planned to hold events to meet more developers, indicating a global outreach strategy.

Organizations

USAID

An organization mentioned as partnering with OpenAI, indicating efforts to leverage AI for global development and humanitarian challenges.

Harvard

Mentioned as a group like Case Text, doing 'cool things' with coding, suggesting its involvement in advanced AI applications.

Studies & Research

S-bench

A benchmark for AI models, where Co-op Labs' Genie model achieved high scores by fine-tuning GPT-4o, though its reasoning traces were withheld, mirroring OpenAI's later competitive approach.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free