Key Moments

AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models

DeepLearning.AIDeepLearning.AI
Education8 min read34 min video
May 20, 2026|243 views|7
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

AMD's Open Claw application allows users to deploy open-source AI models on GPUs and build personal AI agents capable of memory, planning, and API calls, but requires careful configuration of tools and agent personalities.

Key Insights

1

An LLM alone is a text generation system that predicts the next token and cannot act on external systems, maintain persistent memory, or manage multi-step processes.

2

Agents utilize a 'reason, action, observation' (react) loop to continuously think through a problem, call tools, test solutions, and iterate until the task is verified as complete.

3

Open Claw is presented as a persistent application, unlike framework-based agent libraries that require additional scheduling logic for tasks like cron jobs.

4

Deploying the 'Quinn 3.5 20B' model using VLM on AMD GPUs is demonstrated to be straightforward, requiring only the 'VLM serve model name' command and optional parameters for tool calling.

5

Within Open Claw, agent behavior is primarily defined by `.md` files such as `soul.md` (defines personality and behavior) and `agents.md` (defines rules and policies), overriding initial configurations.

6

Multi-agent systems in Open Claw are beneficial because LLMs have limited attention spans and context lengths, making it better to dedicate separate agents to specific tasks rather than overwhelming a single agent.

Bridging the gap between LLMs and actionable agents

The foundational concept discussed is the distinction between a Large Language Model (LLM) and an AI agent. While LLMs are trained to predict the next token and excel at text generation, answering questions, and even coding, they operate in isolation. They lack the ability to interact with external systems, retain memory across interactions, or manage complex, multi-step processes. This is where AI agents come in, addressing these limitations by incorporating memory, planning capabilities, and the ability to execute actions. The core mechanism enabling this is the 'Reason, Action, Observation' (react) loop. This iterative process allows an agent to first reason about a problem, then take an action (such as calling an API or executing code), and finally observe the result. If the task isn't complete or the action was unsuccessful, the agent repeats the loop, refining its approach until the problem is solved and verified. This closed-loop system is crucial for building agents that can reliably perform tasks and test their own solutions, moving beyond static responses to dynamic problem-solving. The discussion highlights that while the base LLM is important, a robust execution loop is equally critical for agent performance. The three key components for building an agent are identified as the model (the LLM itself), the runtime (which manages the loop, context, retries, and state), and the tools (external capabilities like APIs and CLIs).

Open Claw: A persistent application for agent deployment

The workshop introduces Open Claw as a distinct approach to building AI agents compared to common agent frameworks. While many frameworks are libraries that require users to implement their own loops and schedulers, Open Claw is presented as a persistent, standalone application. This architectural difference makes it suitable for tasks that require continuous operation, such as cron jobs or heartbeat monitoring, without the need for additional user-built infrastructure. The persistent nature means Open Claw runs in the background, ready to execute tasks as needed. The presentation emphasizes that Open Claw is designed to work seamlessly with other open-source components like VLM (for model serving) and various LLMs available on platforms like Hugging Face. The hands-on portion leveraged AMD GPUs, providing participants with access to dedicated instances to deploy models and connect them to their Open Claw agents.

Deploying open-source models with VLM on AMD GPUs

A practical demonstration showcased how easy it is to deploy open-source LLMs on AMD GPUs using the VLM serving framework. The core command provided is `VLM serve model_name`, pulling models directly from Hugging Face. For instance, deploying the 'Quinn 3.5 20B' model involves simply specifying its Hugging Face identifier. VLM is highlighted as a versatile framework that works with various GPU brands but is presented as a first-class citizen on AMD hardware. To enable agent functionality, specific parameters like `enable_tool_calling` and a `tool_call_parser` are necessary. These parameters are often model-dependent and provided by the model's creators. An API key can also be configured for security, especially when integrating with applications like Open Claw. The process involves loading model weights onto the GPU and configuring VLM to serve the model, with success indicated by a 'model is ready' message. The ability to serve multimodal models, which can process both text and images, is also mentioned, with a note that initial configuration might default to text-only, requiring explicit enabling of image capabilities.

Configuring agent personality and behavior

Open Claw agents are highly personalized through Markdown (`.md`) files. The `soul.md` file defines the core personality and behavioral guidelines, acting as the agent's 'soul'. The `identity.md` file sets its name and potentially visual cues, while `agents.md` contains more formal rules and policies for its operation. During the initial setup, an interactive onboarding process asks users about their preferences, hard boundaries, and desired demeanor for the agent. The agent then uses this information to dynamically generate or overwrite these `.md` files. This means that even if the underlying LLM has certain capabilities, the agent's behavior is ultimately governed by these configuration files. For example, if a user sets a strict rule in `soul.md` about verification, the agent will adhere to it, potentially overriding other instructions. This file-based configuration system allows for deep customization and control over how the agent interacts and performs tasks.

Empowering agents with tools and custom skills

The utility of an agent is significantly enhanced by the tools it can access and the skills it can develop. Open Claw supports integration with various external capabilities, such as MCP servers, APIs, and CLIs. Tools are typically abstracted from the model, allowing for flexibility. The workshop demonstrates how agents can be given access to user files, bash commands, and potentially even emails or calendars, though security implications are acknowledged. A powerful feature is the ability to create and reuse custom skills. If an agent performs a complex or repetitive task, it can be instructed to encapsulate that functionality into a reusable skill. Later, the agent can be prompted to use this skill for similar problems, even on new projects it hasn't encountered before. This is achieved by saving the skill logic, often in a `skills.py` file within the Open Claw directory, making it persistently available for the agent. The process involves identifying a task, instructing the agent to create a skill for it, and then invoking that skill by name or through natural language commands within Open Claw.

Debugging and problem-solving with AI agents

An example task involved debugging a Python application with an intentional bug. The agent was provided with the GitHub repository and instructed to install and run the project. After identifying a functional issue (e.g., zero accuracy in a word-per-minute counter), the agent could be prompted to investigate and fix the bug. This showcases the agent's ability to read code, understand error messages, and potentially propose or implement solutions. While the agent's debugging capabilities depend on the underlying LLM's coding proficiency, the demonstration suggests that even open-source models can be effective for such tasks. The agent's process would involve cloning the repo, installing dependencies, running the code, analyzing errors, and then modifying the source code to resolve the issue based on its reasoning and available tools.

The benefits and mechanics of multi-agent systems

Open Claw supports multi-agent systems, where several agents can collaborate or specialize in different tasks. This architecture is particularly useful because LLMs have limitations in context length and attention. Trying to load too many instructions or responsibilities into a single agent can dilute its effectiveness. By creating specialized agents, each with its own focus and tools, users can achieve better performance and manage complexity. An example provided is the creation of a 'morning brief' agent designed to gather specific information daily, such as news on particular GitHub projects (e.g., SG Lang, Hugging Face Transformers) or AI hardware news. Adding a new agent is straightforward using commands like `Open Claw agents add [agent_name]`. These agents can communicate and operate within the same environment, but maintaining separate contexts and tool access for each agent ensures they can focus on their designated purpose without interfering with or overwhelming each other. This modular approach allows for sophisticated workflows, such as a chain of agents: one for benchmarking, another for analysis, an optimizer, and a skeptic to validate results.

Interacting with and controlling agents

Users interact with Open Claw agents through a command-line interface. Basic commands allow for setting up agents, configuring their personalities, and assigning tasks. Natural language prompts are the primary method of communication, but specific commands and shortcuts exist for more precise control. For instance, typing a forward slash (`/`) can bring up a menu of available commands, including switching between agents or invoking specific skills. The system also includes control mechanisms like 'stop' or 'abort' keywords. However, the effectiveness of these interruption commands can be inconsistent, sometimes requiring multiple attempts to halt a running process. The developers acknowledge this as a known weakness. The challenge presented to the workshop participants hints at the multimodal capabilities of some models, suggesting that agents might need explicit instructions to leverage their image-processing abilities if initially configured only for text.

Common Questions

An LLM predicts the next token and is primarily a text generation system. A chatbot injects conversation history into the prompt to simulate memory. An agent adds capabilities like persistent memory, planning, API calls, and the ability to take actions within a loop until a task is completed.

Topics

Mentioned in this video

More from DeepLearningAI

View all 80 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free