Key Moments
The new OpenAI Agents Platform: CUA, Web Search, Responses API, Agents SDK!!
Key Moments
OpenAI launches new APIs (Responses, Web Search, Computer Use) and an Agents SDK to enable more sophisticated agentic workflows for developers.
Key Insights
OpenAI is launching a new Responses API to support advanced agentic workflows, complementing but not replacing the existing Chat Completions API.
The new Web Search tool, powered by a fine-tuned GPT-4o model, enhances search accuracy and provides real-time information with citations.
A Computer Use tool, derived from the Operator product, allows developers to build agents that can interact with computer interfaces.
The improved Files Search API now includes metadata filtering, enabling more precise data retrieval for agents.
The new Agents SDK, an evolution of Swarm, offers built-in tracing and support for types and guardrailing to simplify agent orchestration.
OpenAI is aiming for a unified API experience with the Responses API, designed to eventually encompass the functionality of both Chat Completions and the legacy Assistants API.
INTRODUCTION TO NEW OPENAI AGENT TOOLS
OpenAI is releasing a suite of new tools and APIs designed to empower developers in building sophisticated AI agents. The launch includes three new built-in tools: Web Search, an improved File Search, and Computer Use. Complementing these tools is a new Responses API, which is built to handle complex, multi-turn agentic workflows. Additionally, an updated Agents SDK aims to simplify the orchestration and development of these agents. This comprehensive release signals OpenAI's commitment to advancing the capabilities of AI agents for developers.
THE RESPONSES API: A UNIFIED PRIMITIVE
The new Responses API is positioned as the future primitive for agentic workflows, designed to support longer horizon tasks and multi-turn interactions. While the widely used Chat Completions API will remain supported for years, the Responses API is intended to unify functionalities from both Chat Completions and the legacy Assistants API. It simplifies tool integration, offering a more streamlined developer experience for building complex applications. The Responses API also introduces stateful capabilities, storing chat state for 30 days for free, which aids in debugging and development, with options to disable this feature.
ENHANCED WEB SEARCH CAPABILITIES
The Web Search tool, available through the Responses API, leverages a fine-tuned GPT-4o model, significantly improving search accuracy and factual recall compared to general models. This tool allows agents to access real-time information from the web, providing accurate answers with citations to the source material. For developers still using the Chat Completions API, direct access to the GPT-4o search preview model is also provided. This fine-tuned model is specifically built to retrieve and synthesize information accurately, citing it precisely, which is crucial for many enterprise applications.
IMPROVED FILE SEARCH AND DATA MANAGEMENT
The File Search API has been significantly upgraded to better handle user-provided data. It now manages the entire process of parsing, chunking, and embedding data, presenting it as a searchable vector store. A key new feature is metadata filtering, essential for efficiently querying large datasets. This allows developers to use their own data, such as company FAQs or internal documents, to ground agent responses. The File Search API can be used in conjunction with other tools, like Web Search, to create personalized user experiences based on stored preferences and real-time information.
COMPUTER USE TOOL FOR INTERACTIVE AGENTS
The Computer Use tool, inspired by OpenAI's Operator product, enables developers to build agents capable of interacting with computer interfaces. This tool utilizes a custom-tuned model that can interpret screenshots and execute actions like clicking, scrolling, and typing. This capability is fundamental for creating agents that can automate complex tasks across different applications or a user's desktop. The multi-turn nature of the Computer Use tool means agents can perform sequences of actions that might take minutes to complete, opening up new possibilities for automation and user assistance.
THE AGENTS SDK AND ORCHESTRATION
The new Agents SDK, an evolution of the experimental Swarm SDK, provides developers with robust tools for orchestrating complex agentic workflows. It includes built-in tracing capabilities visible in the OpenAI dashboard, allowing developers to monitor and debug agent execution. The SDK supports types and guardrailing for more reliable agent behavior. It's designed to be flexible, working seamlessly with the Responses API and compatible with any API adhering to the Chat Completions format. This SDK aims to simplify the management of multi-agent systems, enabling more modular and maintainable agent architectures.
THE FUTURE OF AGENTS AND DEVELOPER TOOLS
OpenAI views the current releases as foundational steps towards a future where AI agents are deeply integrated into applications. The preview status of tools like Web Search and Computer Use indicates ongoing development and refinement based on developer feedback. The long-term vision is to merge these specialized models into mainline versions, similar to how vision capabilities were integrated into GPT-4. The integration of tracing with evaluation products also points towards a workflow for fine-tuning agents using reinforcement learning, simplifying the process of creating highly capable and reliable autonomous agents.
Mentioned in This Episode
●Software & Apps
●Companies
●People Referenced
Common Questions
OpenAI is launching three new built-in tools: web search, an improved file search tool for custom data, and the computer use tool, which powers the operator product in ChatGPT.
Topics
Mentioned in this video
The new SDK for building agents, evolving from the experimental Swarm, and featuring tracing and support for various APIs.
A company mentioned as an early adopter of an API connected to search, similar to OpenAI's new web search capabilities.
The base model for some of the new preview features like GPT-4o search and vision capabilities, which were later integrated into the core model.
The established API for OpenAI models, which will continue to be supported but is being superseded by the Responses API for new agentic workflows.
Mentioned as a competitor's model that seems to be facing difficulties with agent benchmarks like playing Pokemon.
The new API designed to support agentic workflows and tools, presented as a unified replacement for Assistance API and an evolution of Chat Completions.
Mentioned for its search grounding API, drawing a parallel to OpenAI's new web search features.
An older API with a sunset date of the first half of 2026, elements of which are being merged into the new Responses API.
A fine-tuned model for search, directly available in chat completions and providing a significant accuracy jump compared to simple QA.
An experimental SDK launched last year for multi-agent orchestration, the precursor to the Agents SDK.
A company that uses the web search tool to combine external internet data with internal knowledge, benefiting from citations.
The company founded by Spixs.
The organization launching new API tools, including web search, file search, and computer use.
The company where Allesio is a partner and CTO.
An example company that used the file search tool to incorporate FAQ and travel policies into their assistant.
More from Latent Space
View all 167 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free