Key Moments

AI Dev 26 x SF | Vlad Luzin: Herding Cats—The Hidden Challenges of Multi-Agent Autonomy

DeepLearning.AIDeepLearning.AI
Education6 min read31 min video
May 21, 2026|18 views|1
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

Multi-agent systems are inherently chaotic like herding cats, and current AI evolution is moving towards agents communicating via natural language, but complex engineering challenges like message ordering and error recovery remain.

Key Insights

1

The future of business-to-consumer, business-to-business, and intra-business interactions will be dominated by agents talking to agents using natural language APIs, not traditional JSON schemas.

2

Current autonomous agents can operate for up to a day, and this capability is expected to increase significantly.

3

Connecting remote, distributed agents presents significant engineering challenges, including real-time transport, message sequencing, error recovery, and managing multiple agent interactions within a 'room' or 'channel'.

4

Existing agent communication protocols like Google's A2A only define a transport layer and do not support multi-peer conversations or registries.

5

Enterprises require a governance layer for multi-agent systems, including auditability, observability, agent identity, and multi-tenancy, which are critical for secure deployment.

6

The platform provides an AI mesh as a planet-scale communication layer for agents, alongside capabilities like registries, channels, persistence, and a control plane for enterprise management.

Herding cats: The chaotic nature of multi-agent systems

The speaker likens creating agents to training dogs, where initial expectations are that they will obediently follow commands. However, the reality is that agents are more like cats, often pursuing their own agenda. Multi-agent systems (MAS), involving multiple interacting agents, are then compared to herding cats, highlighting the extreme difficulty in coordinating them to achieve a desired outcome. This analogy sets the stage for the complex engineering challenges involved in developing truly autonomous and collaborative AI systems. The presentation aims to demystify these concepts, providing a shared vocabulary and outlining the evolution and future of agentic technology. The core challenge is making these independent entities work together cohesively towards a common goal, moving beyond simple task execution to complex collaboration.

The evolution from single agents to distributed networks

The AI journey began with standalone Large Language Models (LLMs) performing single tasks, often wrapped in simple web applications. This was followed by the development of graph-based workflows, where tasks were broken down into nodes, each with specific roles and tools, an approach exemplified by platforms like LangGraph. The next phase involved protocol-driven interactions, with initiatives like Entropic's Model Context Protocol (MCP) and Google's Agent-to-Agent (A2A) protocol attempting to standardize communication. However, these protocols often address only specific aspects like transport layers or context sharing, not full agent-to-agent conversation. Recently, concepts like 'skills' and 'sub-agents' (teams) have emerged, aiming to enhance agent capabilities. The current trajectory is towards 'managed agents' and distributed networks where agents operate more autonomously and collaboratively across different environments.

The future: Agents talking to agents via natural language

The future landscape of both consumer-to-business and intra-business interactions will be characterized by agents communicating directly with each other. Instead of rigid, hard-coded JSON APIs that require constant maintenance and backward compatibility considerations, the communication layer between these agents will increasingly be natural language. This shift allows for greater flexibility and leverages advancements in powerful LLMs. Furthermore, agents are becoming more autonomous, capable of performing tasks for extended periods, with some current systems already demonstrating operational capabilities for up to a day. This increasing autonomy implies a future where personal assistant agents can interact with enterprise agents on our behalf, and internal enterprise agents can collaborate to fulfill complex requests without direct human intervention. This forms the basis of a truly interconnected agentic ecosystem.

Bridging the gap: Real-time distributed agent collaboration

A key demonstration showcased a platform enabling distributed, remote agents to communicate and collaborate in real-time. In the example, a 'treasure hunter' agent tasked with booking a trip to the Greek islands autonomously discovered and invited other agents capable of providing weather information and ship availability. This agent then divided the task, delegated sub-tasks to the invited agents, and aggregated their responses to deliver a final report. Crucially, this process occurred without a predefined workflow or graph; the agents dynamically discovered and interacted with each other. This illustrates how agents will engage in back-and-forth messaging as a group, facilitating complex tasks across organizational boundaries. Such real-time, group-based conversations are a critical step towards realizing the vision of autonomous agent collaboration that has been largely theoretical or confined to single-agent interactions until now.

Developer perspective: Automating agent-to-agent workflows

For developers, the current use of coding agents like Claude and Codex often involves manual intervention to orchestrate their interactions. This typically includes copying and pasting outputs between different terminal windows, effectively acting as a manual message bus. The presented platform offers a solution by allowing these agents to connect and converse directly in real-time. For instance, a developer could connect Claude for planning and Codex for review, enabling them to exchange information and feedback autonomously. This eliminates the tedious manual copy-pasting process, allowing agents to manage shared files and socket connections, thus freeing up developer time and enhancing productivity by automating the orchestration of agentic workflows.

Complex challenges in multi-agent communication

Connecting remote agents introduces significant distributed systems engineering problems. Agents, being non-deterministic microservices, require robust communication mechanisms. Key challenges include: ensuring real-time transport with guaranteed message ordering, as incorrect sequencing can degrade agent performance. Handling message queues for back pressure and flow control is essential, especially since agents may re-send messages or tool calls. For continuity, persisted queues that support agent hydration and rehydration after crashes are needed. Runtime binding is also critical, as different agent frameworks use varied session and conversation IDs that must be normalized. When moving beyond two agents, peer-to-peer communication becomes insufficient, necessitating abstractions like channels or rooms to manage multiple participants and facilitate dynamic agent discovery.

The enterprise imperative: Governance and observability

For enterprises to adopt multi-agent systems, a comprehensive governance layer is indispensable. This includes robust auditability and observability, allowing for detailed tracking of all agent activities, including messages, tool calls, and tool results. Each agent must have a clearly defined identity, linked to a human owner, which governs its permissions and scope. Multi-tenancy support is also vital, enabling different departments or teams within an enterprise to operate with distinct sets of agents and access controls. Furthermore, the platform must enable secure connectivity between agents operating across different tenants. Without these foundational elements, the widespread deployment of autonomous agents in enterprise environments, as envisioned in popular media, remains a distant prospect. The platform addresses these needs by providing a centralized control plane for management and oversight.

The AI mesh platform and SDK

The presented platform is built upon an 'AI mesh,' a planet-scale communication layer enabling any agent, regardless of its framework or location, to interact with any other agent, individually or in groups. This mesh is supported by various capabilities, including a registry of remote agents, channels for structured communication, persistence for state management, and message filtering, all handled internally. A control plane provides a UI for enterprises to debug, manage safe interactions, and ensure observability. The primary integration point is an SDK that allows major agent development frameworks to participate in the platform. This SDK abstracts five layers: transport (WebSockets, REST APIs), framework adapters, preprocessing, history convening, and contact handling, all culminating in a wrapper that enables agents to run and communicate within the platform ecosystem. The platform facilitates easy integration for frameworks like Pentai and Claude, allowing them to become participants in inter-agent conversations.

Common Questions

The title 'Herding Cats' is an analogy for multi-agent systems. While agents might seem like dogs that follow direct commands, they are more like cats – independent and prone to doing what they want. Managing a 'herd of cats' (a multi-agent system) to achieve a specific goal is extremely difficult.

Topics

Mentioned in this video

More from DeepLearningAI

View all 80 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free