Key Moments
smol agents are all you need
Key Moments
Small Agents: Building simple, powerful AI agents with code-based actions and a focus on clarity and accessibility.
Key Insights
AI agents enhance LLM capabilities by providing access to the outside world and control over application execution flow.
Small Agents prioritizes simplicity and uses code-based actions over JSON for more efficient and flexible agent development.
The GAIA benchmark evaluates general AI assistants in open-world environments, simulating real-world tasks that take humans time to solve.
GUI agents, capable of interacting with any graphical user interface, represent the next frontier in agent development, moving beyond browser-based interactions.
Code execution and sandboxing are critical for agent security, with solutions like e2b and custom Python interpreters being employed.
The development of AI agents is advancing rapidly, with predictions that significant productivity gains could be realized by 2026.
DEFINING AI AGENTS AND THEIR IMPORTANCE
AI agents are applications where Large Language Models (LLMs) have control over the execution flow. Instead of just relying on their compressed training data, agents can access the outside world to perform tasks. This agency can manifest in various ways, from simple if-else statements determined by LLM output to complex multi-step processes involving loops and function calls. The concept of agency is presented as a continuum rather than a binary state, allowing for nuanced levels of LLM control within applications.
THE ORIGIN AND PHILOSOPHY OF SMALL AGENTS
Small Agents was created to address the perceived complexity of existing agent frameworks. The core philosophy is simplicity, with the main agents.py file containing under 10,000 lines of code. This focus on brevity aims to make the library easily understandable and its logic apparent. A key design choice is the emphasis on code agents, which act using code, as opposed to the more common JSON-based agents, believing this approach offers greater power and flexibility for agent actions.
CODE AGENTS VERSUS JSON AGENTS
The library distinguishes between code agents and tool calling (JSON) agents. While tool calling agents typically output dictionaries for tool names and arguments, code agents allow LLMs to directly execute code. This is exemplified by a paper comparing workflows where code agents can manage complex structures like for loops and parallel execution, tasks that would require numerous JSON actions and be difficult to parallelize. Code agents also facilitate easier variable management and more precise control over execution.
DEVELOPMENT AND SECURITY CONSIDERATIONS
The development of Small Agents builds upon prior work like transformers.agents, which introduced a custom Python interpreter to handle the security challenges of code execution. For sandboxing, the current implementation utilizes e2b, with plans to integrate Docker support. The selection of e2b was based on its quick startup, robust support, and the team's preference for adopting the simplest, most solid solution for rapid development and iteration.
THE HUGGING FACE AGENTS ECOSYSTEM
Small Agents is part of a broader effort at Hugging Face. The 'Agents Course' has seen significant interest, even overwhelming the sign-up website at times. This course aims to teach agents from the ground up, covering basic concepts and coding from scratch before exploring various frameworks. This approach contrasts with learning frameworks directly without understanding the fundamental principles, with certification offered for participants.
THE GAIA BENCHMARK FOR GENERAL ASSISTANCE
GAIA (General AI Assistance Benchmark) evaluates AI agents on their ability to perform general assistance tasks in an open-world environment, including internet browsing. The tasks are designed to be complex, requiring multiple steps (five to ten on average) and rigorous calculation, mirroring human tasks that take 10 to 60 minutes. A key feature of GAIA is the hidden test set, aiming to prevent contamination and ensure a fair evaluation of agent capabilities.
PERFORMANCE AND TRENDS IN AGENT DEVELOPMENT
Results on the GAIA validation set show competitive performance, with Small Agents achieving third place, especially when utilizing advanced models like '01'. While some submissions leverage majority voting for score improvements, this comes at a higher cost. The trend line for GAIA scores suggests that hitting 90% completion, signifying a doubling of productivity for many computer-based tasks, could be achieved by or before 2026.
THE EVOLUTION TOWARDS GUI AGENTS
The next significant step in agent development is GUI agents, which can interact with any graphical user interface, not just browsers. This requires agents to 'see' the screen and act upon it, potentially using vision models and point-and-click actions. While current browser-based agents benefit from DOM access, the broader goal is to create agents that can navigate and operate any GUI, mimicking human interaction more closely. Solutions like e2b and tools for VM management are crucial for this advancement.
ADVANCEMENTS IN COMPUTER USE AND DATA GENERATION
The concept of 'computer use' agents involves operating within virtual machines, with tools like Morevlabs' 'Infinite Branch' enabling time travel capabilities for spawned VMs. This feature is valuable for generating synthetic data by having thousands of parallel agents perform tasks and interactions. The ability to merge human control with VM operations points towards more sophisticated agent environments and data collection methods for training.
FUNDAMENTAL LIMITATIONS AND FUTURE BREAKTHROUGHS
Despite rapid progress, fundamental limitations still exist, particularly in how agents perceive and interact with visual environments like VMs. While text-based reasoning has seen significant leaps, visual models require further advancement to match this progress. It's anticipated that improvements in base visual models will lead to similar 'jump' advancements on the S-curve for GUI agents, potentially making 2025 the year these capabilities mature significantly.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Studies Cited
●Concepts
●People Referenced
GAIA Benchmark Scores (Validation Set)
Data extracted from this episode
| Model | Level 1 | Level 2 | Level 3 | Average |
|---|---|---|---|---|
| Small Agents (with 01) | 76 | 79 | 63 | 73+ |
| Meta (with Cloud 3.5) | 76 | 79 | 51 | 69 |
| OpenAI (with GPT-4) | 74 | 78 | 55 | 69 |
| OpenAI Deep Research (with 01) | 76 | 79 | 63 | 73+ |
GAIA Benchmark Scores (Test Set)
Data extracted from this episode
| Model | Average Score |
|---|---|
| Small Agents (with 01) | 65 |
| Meta (with Cloud 3.5 Sonnet) | 66 |
| OpenAI (with GPT-4) | 67 |
Common Questions
An AI agent is an application where a Large Language Model (LLM) has control over the execution flow of the app. This agency can range from simple routing decisions to complex multi-step functions and tool execution.
Topics
Mentioned in this video
Mentioned for her equation regarding AI agents, which includes memory, planning, and action.
Associated with Open Hands and Open DevCon, discussed the 'Code Actions & Better LLM Agents' paper.
Author of the essay 'What is an AI Agent?', influential in defining AI agents and their agency.
Head of Open Source at Hugging Face, involved in the initial development of Transformers.agents.
General AI Assistance Benchmark, designed to evaluate the generality and capability of AI agents in real-world tasks.
An AI agent or tool from OpenAI, noted for its strong web browsing capabilities, which is seen as a key component for GUI agents.
An organization associated with Graham Noyce, involved in discussions about LLM agents.
A framework mentioned in relation to the essay on AI agents written by Harrison Chase.
A sandboxing tool and execution environment currently used by Small Agents, noted for its quick start and robustness.
A large language model that performed well with Small Agents on agent tasks, as mentioned during the GAIA benchmark discussion.
An open-source library for building AI agents, focused on simplicity and code-based actions, with the core file being under 10,000 lines of code.
An experimental precursor to Small Agents, which featured a custom Python interpreter for code agents.
Mentioned as a potential model for improving scores on the GAIA benchmark, especially when used with techniques like majority voting.
A company that developed 'Infinite Branch,' an API for computer use, allowing time travel on VMs.
Mentioned in the context of their Deep Research initiative and their models' performance on the GAIA benchmark.
Mentioned as a company that uses the term 'computer use' (CUA) for their agent initiatives.
Mentioned as a key player that significantly impacted the AI landscape in January, contributing to excitement around agent development.
The company where the speaker Emer works and leads initiatives like Small Agents and an Agent Course.
More from Latent Space
View all 167 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free