How can game theory help in developing robots that understand interaction?

Game theory provides concepts like 'equilibria' (e.g., Nash equilibria) that mathematically formalize the outcome of joint prediction and planning. By seeking these equilibria, robots can make decisions that are optimal in response to the anticipated actions of others, leading to more coordinated and predictable behavior.

What are potential games and why are they useful in robotics?

Potential games are a specific, well-studied class of games where finding equilibria is significantly simplified. Instead of solving complex coupled optimal control problems, one can minimize a single potential function, making the planning process much more efficient for robots.

Why is learning from human interactions more effective than learning from humans in isolation?

Agent decisions are interdependent. Learning solely from a human in isolation might not reveal their behavior in interactive contexts. Demonstrations of interactions between humans provide richer data, better reflecting their preferences and how they adapt to others' actions or social norms.

What are the key challenges with imitation learning in multi-agent systems?

Multi-agent imitation learning faces challenges like 'exploitability,' where learned policies can be exploited by other agents reacting to suboptimal behavior. It also requires strict assumptions like full state support and exact matching of state-action occupancy measures, which are hard to achieve, especially with multimodal interaction behaviors.

How can diffusion policies help robots coordinate?

Diffusion policies are effective at capturing multi-modal behaviors, allowing robots to learn and implicitly coordinate with others based on the chosen interaction mode. This is particularly useful for tasks requiring decentralized control and handling multiple ways of achieving a goal.

Can large language models (LLMs) act as coaches for robots?

Yes, LLMs can function as coaches by breaking down complex tasks into simpler curricula, generating reward functions, and providing feedback. This approach has shown promise in teaching robots complex behaviors like running and collaborating, often without requiring extensive retraining of the LLM itself.

How do LLM critics help in multi-agent reinforcement learning?

LLM critics can act as a 'credit assigner' by evaluating robot actions and determining which agent deserves credit for success or failure. This significantly improves learning performance in multi-agent RL, outperforming other state-of-the-art algorithms by helping agents learn more effectively from team efforts.

What role does distributed optimization play in robot perception?

Techniques like ADMM for distributed optimization and consensus allow multiple robots to collaboratively map an environment. This approach, combined with modern methods like nerves, enables more efficient and comprehensive scene mapping.

How can robots adapt to changing environments without forgetting past learning?

A consensus-based approach similar to ADMM can be applied to the neural network's weight space. This allows a robot to adapt to new information (like a moved object) while preserving knowledge from previous states, effectively acting as a multi-agent problem where the current self reaches consensus with past versions.

What are the key factors for effectively using LLMs as robot coaches?

Effective LLM coaching relies on clear task breakdown, providing understandable input (like training curves or sequences of images), and iterating on the advice given. The focus should be on high-level decisions rather than low-level actions for best results.

Can current LLM coaching methods be used for high-precision robotic tasks?

The speaker suggests that current LLM coaching methods are not yet suitable for high-precision tasks that require very accurate reward generation and consistent, low-variance policies. While effective for general coordination and skill learning, high-precision tasks may require different approaches due to the inherent randomness in RL policies.

Key Moments

Stanford Robotics Seminar ENGR319 | Spring 2026 | Interactive Autonomy

Stanford Online

Education7 min read72 min video

May 20, 2026|1,071 views|41|2

Stanford Stanford Online Robotics

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Robots are struggling to interact safely with humans and other robots due to complex decision-making. New approaches leverage game theory and AI coaching to enable more intelligent and coordinated robotic behaviors.

Key Insights

Directly mimicking robot behavior (behavioral cloning) in multi-agent systems is challenging due to 'exploitability,' where agents can react to and exploit each other's suboptimal policies.

To address the difficulty in human-robot interaction, researchers are exploring using large language models (LLMs) as coaches to break down complex tasks, generate rewards, and assign credit in multi-agent reinforcement learning.

Potential games offer a way to simplify multi-agent coordination problems by reducing them to a single optimization problem, leading to solutions that are up to 20 times faster than traditional methods.

Learning human cost functions (Inverse Reinforcement Learning) from demonstrations of interactions, rather than isolated actions, is crucial for robots to understand and adapt to human behavior, especially in complex scenarios.

Multi-agent RL methods have struggled to achieve coordination, with one student spending three years on reward shaping for a simple basket-lifting task without success, highlighting the need for advanced coaching techniques.

Using LLMs or VLMs as coaches for robots, even without fine-tuning, has shown promising results in teaching complex behaviors like running backwards for humanoids and coordinating two robots to lift a pot.

The challenge of multi-agent coordination and interaction

The increasing presence of robots in human environments, from warehouses to manufacturing and even homes, highlights a critical challenge: ensuring safe and intelligent interactions between robots and with humans. Real-world incidents, such as robots malfunctioning in restaurants or autonomous vehicles getting stuck in traffic due to coordination failures, underscore the difficulty of multi-agent decision-making. Even seemingly simple scenarios, like Amazon robots in a standoff or autonomous cars unable to navigate around each other, demonstrate that coordinating multiple agents, whether robots or humans, is far from trivial. This complexity arises because an agent's optimal decision depends heavily on predicting and accounting for the reactions of other agents. Humans navigate this by developing a 'theory of mind,' a capability that robots currently lack, making inter-agent coordination a significant hurdle.

Leveraging game theory for joint prediction and planning

To tackle the complexity of multi-agent interactions, researchers are turning to game theory. The core idea is that agents need to engage in 'joint prediction and planning,' meaning they must consider how others will react to their actions when making their own decisions. Game theory provides formalisms for this, particularly through the concept of equilibria. In a Nash equilibrium, for example, each agent's chosen strategy is the best possible response to the strategies of all other agents, meaning no agent has an incentive to unilaterally change its action. This elegant concept allows for modeling the interdependent decision-making required for coordination. However, computing these equilibria, especially for complex, non-linear robotic systems operating in real-time, is computationally intractable. This computational bottleneck has historically limited the practical application of game-theoretic solutions in robotics, despite their theoretical power.

Potential games as a simplification for coordination

A significant breakthrough in applying game theory to robotics comes from identifying specific classes of games that are easier to solve. 'Potential games' are a well-studied category where the coupled optimal control problems required to find equilibria can be reduced to solving a single, uncoupled optimization problem: minimizing a 'potential function.' This simplification dramatically reduces computational complexity. Research has shown that many real-world robotic interactions, particularly those involving collision avoidance with symmetric costs, can be modeled as potential games. By transforming the multi-agent coordination problem into minimizing a single potential function, robots can find equilibria much faster. For instance, one approach demonstrated a 20-fold speedup in computing equilibria for two- and four-agent systems compared to existing solvers, making real-time coordination more feasible. This opens the door to more reliable and efficient robotic navigation and interaction.

Learning human intentions through inverse reinforcement learning

While game theory provides frameworks for robot-robot coordination, understanding and interacting with humans requires robots to infer human objectives. This is the domain of inverse reinforcement learning (IRL). Traditional IRL assumes learning from a single agent's demonstrations, inferring a cost function that explains their behavior. However, in interactive scenarios, an agent's decisions are interdependent. Therefore, learning from demonstrations of *interactions* between agents, rather than isolated actions, is far more informative. This approach acknowledges that human behavior is often influenced by social norms and conventions, like yielding to the left or right in pedestrian traffic. By observing humans interacting, robots can better learn the underlying preferences and potentially irrational behaviors that shape human decision-making. This is crucial for robots to adapt to diverse human behaviors and social norms, moving beyond simple imitation to genuine understanding.

Entropic cost equilibria for boundedly rational agents

To better model human decision-making, which is often not perfectly rational, researchers have extended game-theoretic concepts. 'Quantal response equilibria' and 'entropic cost equilibria' incorporate bounded rationality by modeling agents as probabilistically choosing actions based on expected costs. This approach, akin to a noisy version of Nash equilibrium, acknowledges that humans don't always make the mathematically optimal choice. The entropic cost equilibrium formulation is particularly powerful as it extends the maximum entropy principle to multi-agent settings. It allows for modeling how agents interact rationally or irrationally, with a parameter controlling the level of irrationality. This framework has revealed that bounded rationality can lead to emergent interaction modes that are not apparent when assuming perfect rationality. By learning these cost functions, robots can predict human motion more accurately than state-of-the-art imitation learning algorithms, as demonstrated in campus pedestrian trajectory prediction.

The limitations of pure behavioral cloning in multi-agent settings

While directly imitating expert policies, known as behavioral cloning, seems straightforward, it faces significant challenges in multi-agent systems. In single-agent learning, the main concern is suboptimality relative to the expert policy. However, in multi-agent settings, an additional problem arises: exploitability. Learned policies can be exploited by other agents who deviate from their expert strategies, leading to unforeseen and potentially catastrophic outcomes. Achieving non-exploitable policies requires strong assumptions, such as exact matching of state-action occupancy measures, which are rarely met in practice. Furthermore, if multiple interaction modes (equilibria) exist, standard behavioral cloning tends to average them, collapsing the learned behavior into a suboptimal compromise. This suggests that simply mimicking observed actions is insufficient for complex, coordinated multi-agent behaviors. Diffusion policies are being explored as a way to capture multiple interaction modes, but they still require careful data collection of interactions.

AI as a coach for complex robotic learning

The inherent difficulties in multi-agent reinforcement learning (MARL) and the limitations of imitation learning have led researchers to explore novel approaches, particularly leveraging the power of large foundation models as 'coaches.' Instead of directly learning policies or rewards, these models can guide the learning process. This coaching can take several forms: 1) **Curriculum Generation:** Breaking down complex tasks into a sequence of simpler subtasks, starting with basic stability and progressing to more complex maneuvers, much like how humans learn. 2) **Reward Generation:** Providing guidance on defining reward functions for these subtasks, which is notoriously difficult in MARL. 3) **Credit Assignment:** Helping to determine which agent deserves credit for successful actions in a team effort, a crucial but challenging aspect of MARL. This coaching approach has shown remarkable success, enabling a bipedal robot to learn to run by breaking down the task and generating appropriate rewards, even teaching it to run backward. The framework has also enabled two robots to coordinate in lifting a heavy pot, a task that proved intractable with traditional RL and reward shaping.

LLM-based critics for enhanced MARL performance

Beyond coaching, large language models (LLMs) and visual language models (VLMs) are being used as 'critics' within the MARL training loop. After a set of agents execute actions, the LLM can analyze the resulting trajectory and provide feedback on credit assignment. This has led to significant performance improvements, with one study showing orders of magnitude increase in reward for multi-robot warehouse tasks compared to state-of-the-art MARL algorithms like MPO and QMIX. The LLM's ability to interpret high-level decisions and provide nuanced feedback, without requiring retraining, is a key advantage. Preliminary results even suggest that VLMs can assist in training two humanoids to work together. This 'zero-shot' or 'one-shot' application of foundation models, leveraging their pre-existing reasoning capabilities, offers a promising path to overcoming the challenges of complex multi-agent coordination and learning.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

The primary challenge is enabling robots to perform 'joint prediction and planning.' This means robots need to reason about and predict the likely reactions of other agents (humans or robots) to their own decisions and incorporate that understanding into their actions. Without this, interactions can lead to unexpected and potentially dangerous outcomes.

Topics

Reinforcement Learning AI & Machine Learning Technology & Innovation Game Theory Large Language Models Human-robot Interaction Multi-Agent Systems Imitation Learning Robot Coordination Machine Learning Coaching

Mentioned in this video

Software & Apps

LIA

A state-of-the-art multi-agent reinforcement learning algorithm used as a baseline, showing lower reward performance compared to LLM-critic based methods.

ADMM

Algorithm for consensus and distributed optimization, applied in the lab for mapping scenes collaboratively and used in continual learning by reaching consensus on model parameters.

LLM

Large Language Models used as coaches for robots, breaking down tasks, generating rewards, and providing feedback without retraining.

VLM

Visual Language Models used in coaching robots, found effective when analyzing training curves or sequences of images.

MPO

A multi-agent reinforcement learning algorithm mentioned as a baseline that struggled with complex coordination tasks, contrasted with methods that use coaching.

QMIX

A state-of-the-art multi-agent reinforcement learning algorithm used as a baseline, showing lower reward performance compared to LLM-critic based methods.

Nerves

A technique used in conjunction with distributed optimization for collaborative scene mapping by robots.

Organizations

UC Berkeley

The institution where the speaker's lab, Icon Lab, is located.

UT Austin

Collaborated on a study using campus cameras to collect pedestrian trajectories and predict human motion.

Companies

Whimo

A company whose autonomous cars were observed stuck in traffic in San Francisco, illustrating multi-robot coordination challenges.

Amazon

Mentioned in relation to robots getting stuck in a standoff, highlighting coordination difficulties.

Books

Cinderella

A movie scene inspiring the idea of quadrupeds helping with tasks, akin to birds helping Cinderella.

Locations

Singapore

The speaker used an experience in Singapore, where the convention is to yield to the left, to illustrate the importance of robots adapting to social norms and different equilibria.

Concepts

Nash equilibria

A game theory concept where agents predict others' behavior and take the best possible action, crucial for modeling joint prediction and planning in multi-agent systems. Its computation is challenging.

Gaussian splats

A modern technique used in conjunction with distributed optimization and nerves for collaboratively mapping a scene.

People

Dorsa Shahabi

The speaker mentions Dorsa having a baby during the development of the humanoid running project, drawing inspiration from observing infant learning.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free