What was the main motivation behind creating the 'Small Agents' library?

The creator found existing agent libraries too complicated. 'Small Agents' was developed to be simple, understandable, and to focus on code agents that act in code, rather than just JSON.

What are 'code agents' and why are they preferred over 'JSON agents'?

Code agents allow LLMs to write and execute actions in code, offering more precise control over execution flow, parallelization, and defining variables. This is contrasted with JSON agents, which can be more verbose and less flexible for complex tasks.

How does the GAIA benchmark evaluate AI agents?

GAIA (General AI Assistance Benchmark) evaluates agents in an open-world environment through general questions requiring 10-60 minutes for a human to solve, often involving multi-step reasoning, web browsing, and calculations.

What is the significance of 'Computer Use' (CUA) in agent development?

'Computer Use' refers to agents capable of interacting with any graphical user interface (GUI), not just web browsers. This allows agents to perform tasks on screen recordings or virtual machines, broadening their applicability.

When is 2025 expected to be the 'year of the agents'?

While UI improvements are important, fundamental limitations in models, particularly visual models, are still a barrier. However, advancements in base models are expected to lead to significant breakthroughs, potentially making 2025 the year of agents.

Key Moments

smol agents are all you need

Latent Space Podcast

Science & Technology4 min read23 min video

Feb 13, 2025|11,499 views|299|12

Save to Pod

Key Moments

TL;DR

Small Agents: Building simple, powerful AI agents with code-based actions and a focus on clarity and accessibility.

Key Insights

AI agents enhance LLM capabilities by providing access to the outside world and control over application execution flow.

Small Agents prioritizes simplicity and uses code-based actions over JSON for more efficient and flexible agent development.

The GAIA benchmark evaluates general AI assistants in open-world environments, simulating real-world tasks that take humans time to solve.

GUI agents, capable of interacting with any graphical user interface, represent the next frontier in agent development, moving beyond browser-based interactions.

Code execution and sandboxing are critical for agent security, with solutions like e2b and custom Python interpreters being employed.

The development of AI agents is advancing rapidly, with predictions that significant productivity gains could be realized by 2026.

DEFINING AI AGENTS AND THEIR IMPORTANCE

AI agents are applications where Large Language Models (LLMs) have control over the execution flow. Instead of just relying on their compressed training data, agents can access the outside world to perform tasks. This agency can manifest in various ways, from simple if-else statements determined by LLM output to complex multi-step processes involving loops and function calls. The concept of agency is presented as a continuum rather than a binary state, allowing for nuanced levels of LLM control within applications.

THE ORIGIN AND PHILOSOPHY OF SMALL AGENTS

Small Agents was created to address the perceived complexity of existing agent frameworks. The core philosophy is simplicity, with the main agents.py file containing under 10,000 lines of code. This focus on brevity aims to make the library easily understandable and its logic apparent. A key design choice is the emphasis on code agents, which act using code, as opposed to the more common JSON-based agents, believing this approach offers greater power and flexibility for agent actions.

CODE AGENTS VERSUS JSON AGENTS

The library distinguishes between code agents and tool calling (JSON) agents. While tool calling agents typically output dictionaries for tool names and arguments, code agents allow LLMs to directly execute code. This is exemplified by a paper comparing workflows where code agents can manage complex structures like for loops and parallel execution, tasks that would require numerous JSON actions and be difficult to parallelize. Code agents also facilitate easier variable management and more precise control over execution.

DEVELOPMENT AND SECURITY CONSIDERATIONS

The development of Small Agents builds upon prior work like transformers.agents, which introduced a custom Python interpreter to handle the security challenges of code execution. For sandboxing, the current implementation utilizes e2b, with plans to integrate Docker support. The selection of e2b was based on its quick startup, robust support, and the team's preference for adopting the simplest, most solid solution for rapid development and iteration.

THE HUGGING FACE AGENTS ECOSYSTEM

Small Agents is part of a broader effort at Hugging Face. The 'Agents Course' has seen significant interest, even overwhelming the sign-up website at times. This course aims to teach agents from the ground up, covering basic concepts and coding from scratch before exploring various frameworks. This approach contrasts with learning frameworks directly without understanding the fundamental principles, with certification offered for participants.

THE GAIA BENCHMARK FOR GENERAL ASSISTANCE

GAIA (General AI Assistance Benchmark) evaluates AI agents on their ability to perform general assistance tasks in an open-world environment, including internet browsing. The tasks are designed to be complex, requiring multiple steps (five to ten on average) and rigorous calculation, mirroring human tasks that take 10 to 60 minutes. A key feature of GAIA is the hidden test set, aiming to prevent contamination and ensure a fair evaluation of agent capabilities.

PERFORMANCE AND TRENDS IN AGENT DEVELOPMENT

Results on the GAIA validation set show competitive performance, with Small Agents achieving third place, especially when utilizing advanced models like '01'. While some submissions leverage majority voting for score improvements, this comes at a higher cost. The trend line for GAIA scores suggests that hitting 90% completion, signifying a doubling of productivity for many computer-based tasks, could be achieved by or before 2026.

THE EVOLUTION TOWARDS GUI AGENTS

The next significant step in agent development is GUI agents, which can interact with any graphical user interface, not just browsers. This requires agents to 'see' the screen and act upon it, potentially using vision models and point-and-click actions. While current browser-based agents benefit from DOM access, the broader goal is to create agents that can navigate and operate any GUI, mimicking human interaction more closely. Solutions like e2b and tools for VM management are crucial for this advancement.

ADVANCEMENTS IN COMPUTER USE AND DATA GENERATION

The concept of 'computer use' agents involves operating within virtual machines, with tools like Morevlabs' 'Infinite Branch' enabling time travel capabilities for spawned VMs. This feature is valuable for generating synthetic data by having thousands of parallel agents perform tasks and interactions. The ability to merge human control with VM operations points towards more sophisticated agent environments and data collection methods for training.

FUNDAMENTAL LIMITATIONS AND FUTURE BREAKTHROUGHS

Despite rapid progress, fundamental limitations still exist, particularly in how agents perceive and interact with visual environments like VMs. While text-based reasoning has seen significant leaps, visual models require further advancement to match this progress. It's anticipated that improvements in base visual models will lead to similar 'jump' advancements on the S-curve for GUI agents, potentially making 2025 the year these capabilities mature significantly.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Studies Cited

●Concepts

●People Referenced

GAIA Benchmark Scores (Validation Set)

Data extracted from this episode

Model	Level 1	Level 2	Level 3	Average
Small Agents (with 01)	76	79	63	73+
Meta (with Cloud 3.5)	76	79	51	69
OpenAI (with GPT-4)	74	78	55	69
OpenAI Deep Research (with 01)	76	79	63	73+

GAIA Benchmark Scores (Test Set)

Data extracted from this episode

Model	Average Score
Small Agents (with 01)	65
Meta (with Cloud 3.5 Sonnet)	66
OpenAI (with GPT-4)	67

Common Questions

An AI agent is an application where a Large Language Model (LLM) has control over the execution flow of the app. This agency can range from simple routing decisions to complex multi-step functions and tool execution.

Topics

Ai Agents AI & Machine Learning Technology & Innovation Open Source Software Agent Frameworks LLM Applications Code Execution GUI Automation

Mentioned in this video

People

Lilian Weng

Mentioned for her equation regarding AI agents, which includes memory, planning, and action.

Graham Noyce

Associated with Open Hands and Open DevCon, discussed the 'Code Actions & Better LLM Agents' paper.

Harrison Chase

Author of the essay 'What is an AI Agent?', influential in defining AI agents and their agency.

Lasse Andersen

Head of Open Source at Hugging Face, involved in the initial development of Transformers.agents.

Software & Apps

Gaia benchmark

General AI Assistance Benchmark, designed to evaluate the generality and capability of AI agents in real-world tasks.

Operator

An AI agent or tool from OpenAI, noted for its strong web browsing capabilities, which is seen as a key component for GUI agents.

Open Hands

An organization associated with Graham Noyce, involved in discussions about LLM agents.

LangChain

A framework mentioned in relation to the essay on AI agents written by Harrison Chase.

E2B

A sandboxing tool and execution environment currently used by Small Agents, noted for its quick start and robustness.

GPT-4

A large language model that performed well with Small Agents on agent tasks, as mentioned during the GAIA benchmark discussion.

Small Agents

An open-source library for building AI agents, focused on simplicity and code-based actions, with the core file being under 10,000 lines of code.

Transformers.agents

An experimental precursor to Small Agents, which featured a custom Python interpreter for code agents.

Claude 3

Mentioned as a potential model for improving scores on the GAIA benchmark, especially when used with techniques like majority voting.

Companies

Morflabs

A company that developed 'Infinite Branch,' an API for computer use, allowing time travel on VMs.

OpenAI

Mentioned in the context of their Deep Research initiative and their models' performance on the GAIA benchmark.

Anthropic

Mentioned as a company that uses the term 'computer use' (CUA) for their agent initiatives.

DeepSeek

Mentioned as a key player that significantly impacted the AI landscape in January, contributing to excitement around agent development.

Hugging Face

The company where the speaker Emer works and leads initiatives like Small Agents and an Agent Course.

Studies & Research

Code Actions & Better LLM Agents

A paper that provides a comparison between code agents and JSON agents, highlighting the advantages of code agents.

Concepts

Computer Use

A concept referring to AI agents that can interact with graphical user interfaces, extending beyond web browsers to general GUIs and virtual machines.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free