Key Moments

Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344

Lex FridmanLex Fridman
Science & Technology8 min read150 min video
Dec 6, 2022|459,528 views|4,625|468
Save to Pod
TL;DR

Noam Brown discusses AI breakthroughs in poker (Libratus, Pluribus) and strategic negotiation in Diplomacy (Cicero).

Key Insights

1

AI's success in poker (Libratus, Pluribus) demonstrates Game Theory Optimal (GTO) strategies, particularly Nash equilibrium approximation, can outperform human intuition and exploitative play, especially with search-based algorithms.

2

Imperfect information games like poker are fundamentally harder for AI than perfect information games like chess or Go, requiring algorithms to balance action probabilities and unpredictability (e.g., bluffing).

3

Search is crucial for superhuman AI performance in games; even powerful neural networks benefit significantly from real-time planning and lookahead, as seen in AlphaZero's Elo rating drop without search.

4

Solving Diplomacy, a game of strategic negotiation with natural language, necessitates incorporating human play data due to the cooperative element, cultural norms, and the need for human-compatible communication, unlike purely adversarial games.

5

Trust-building and understanding human irrationality (like anger or anti-AI bias) are critical for AI success in cooperative-adversarial games, highlighting the limitations of pure self-play in human-centric domains.

6

Future AI applications include creating human-like and tunable game opponents, enhancing NPC interactions in video games, and potentially aiding real-world diplomacy through sophisticated negotiation models.

DEFINING NO-LIMIT POKER AND THE NASH EQUILIBRIUM

No-Limit Texas Hold'em, the most popular poker variant, allows players to bet any amount, leading to rapid stake escalation and the potential for 'jumpy' or aggressive play. While strategy is rewarded, the 'no-limit' aspect makes it easier to put opponents in uncomfortable positions. The underlying theory for solving such games is the Nash equilibrium, which posits that in any finite, two-player, zero-sum game, an optimal strategy exists. If a player consistently follows this strategy, they are guaranteed not to lose in expectation over the long run, irrespective of the opponent's actions. This concept, akin to randomly choosing rock, paper, or scissors, extends to complex games like poker, albeit with highly intricate strategies.

THE CHALLENGE OF IMPERFECT INFORMATION AND BLUFFING

Poker is fundamentally different from perfect information games like chess or Go due to hidden information—players' private hole cards. This imperfect information introduces complexities where agents must reason about action probabilities rather than just optimal actions. The value of an action, such as bluffing, depends on its frequency and the opponent's perception of a player's tendencies. Libratus, an AI co-created by Noam Brown, embodied this balance, aptly named for its ability to find the 'right balance' in action frequencies. It demonstrated how AI, by approximating the Nash equilibrium, effectively manages bluffing and counter-bluffing without explicitly modeling an opponent's mind.

SELF-PLAY AND COUNTERFACTUAL REGRET MINIMIZATION

AI systems like Libratus and Pluribus learn optimal strategies through self-play, employing a method called counterfactual regret minimization (CRM). Starting with random play, the algorithm iteratively plays against itself, evaluating alternative actions (counterfactuals) at each decision point. It calculates 'regret' for not having taken an alternative action and, over millions of simulations, gradually converges towards a Nash equilibrium. This process mirrors human learning, where players reflect on 'what if' scenarios. Neural networks play a crucial role in generalizing these learned strategies across an astronomically vast state space (e.g., poker's 10^161 decision points), enabling the AI to apply knowledge from similar situations.

THE CRITICAL ROLE OF SEARCH IN AI GAMEPLAY

Noam Brown emphasizes that search, or real-time planning, is an underestimated yet essential component of superhuman AI performance across games. While neural networks provide an instantaneous 'intuition' (policy network), search allows the AI to look many moves ahead and refine its strategy, much like a human grandmaster. In poker, this involves searching over possible actions for one's own hand and possible hands for opponents to maintain unpredictability and balance. Without search, even AlphaZero's Elo rating in Go plummets, demonstrating that raw neural network power alone is insufficient; sophisticated planning is vital for comprehensive game mastery. This search capability was a key algorithmic improvement from Libratus's loss in 2015 to its decisive win in 2017.

LIBRATUS'S VICTORY: GAME THEORY OPTIMAL VS. EXPLOITATIVE PLAY

In 2017, Libratus decisively defeated four top professional heads-up No-Limit Texas Hold'em players, winning nearly two million dollars in virtual money over 120,000 hands. This victory was a landmark, validating Game Theory Optimal (GTO) poker over purely exploitative play, which focuses on reading opponents. Libratus did not attempt to adapt to or exploit human weaknesses; it simply aimed to approximate the Nash equilibrium. A significant strategic innovation observed was Libratus's use of 'overbets' (betting 10x the pot), a tactic humans rarely employed. This seemingly abnormal play put human opponents in extremely difficult, high-stakes decisions, ultimately proving to be an optimal strategy that has since been adopted by elite human players.

EVOLVING FROM HEADS-UP TO MULTI-PLAYER POKER (PLURIBUS)

The transition from two-player (Libratus) to six-player (Pluribus) poker presented new challenges, as the game is no longer strictly zero-sum, complicating Nash equilibrium guarantees. However, due to poker's inherently adversarial nature, the self-play techniques, though theoretically unproven for multi-player non-zero-sum games, still proved highly effective. A key algorithmic breakthrough for Pluribus was 'depth-limited search,' which significantly reduced computational costs by planning only a few moves ahead and using value estimates for the rest, unlike Libratus's full-game search. This made Pluribus vastly more computationally efficient, training for less than $150 on AWS compared to Libratus's $100,000 cost, highlighting the power of algorithmic innovation over raw computing power.

DIPLOMACY: A NEW FRONTIER FOR AI IN STRATEGIC NEGOTIATION

Diplomacy, a seven-player game of strategic negotiation set in World War I Europe, represents a significant departure from previous adversarial game AI challenges. It emphasizes alliance formation, betrayal, and, crucially, natural language communication. Players engage in private, unstructured conversations to forge alliances and plan moves, with simultaneous action execution allowing for deception. This unique blend of risk-like war strategy, poker-like game theory, and 'Survivor'-like social dynamics makes it a 'game about people rather than pieces.' The historical appeal to figures like JFK and Henry Kissinger underscores its profound nature as a model for geopolitical interaction and diplomatic failures.

THE COMPLEXITY OF NATURAL LANGUAGE AND COOPERATION

Diplomacy's core challenge for AI is its natural language component. Unlike games with bounded communication (e.g., Settlers of Catan), Diplomacy involves broad and deep conversations encompassing strategy, trust, betrayal, and long-term alliances. The sheer breadth of topics and the unstructured nature of human communication make it incredibly difficult for AI. Furthermore, unlike adversarial games, Diplomacy's cooperative aspect means pure self-play is insufficient. An AI trained solely against itself would develop an 'alien' language and strategy, unable to interact or build trust with humans. This necessitates incorporating human play data to ensure the AI understands and adheres to human social conventions and expectations in negotiation.

CICERO'S APPROACH: HUMAN COMPATIBILITY AND CONTROLLED LANGUAGE

Cicero, the AI developed for Diplomacy, addresses these challenges by integrating human data and controlled natural language generation. It combines reinforcement learning and planning (similar to poker AIs) to determine optimal strategic 'intents' (desired actions for itself and others). These numeric intents are then fed into a pre-trained language model, which generates human-like messages. A crucial aspect is a filtering mechanism that evaluates messages for sensibility, potential harm, and propensity to lie. Cicero's design actively minimizes deception, recognizing that while lying can offer short-term gains, it ultimately erodes trust, leading to worse long-term performance—a key insight transferable to real-world human interactions.

THE ANTI-AI BIAS AND THE IMPORTANCE OF TRUST

Cicero's success (ranking second among approximately 80 human players) in live Diplomacy games highlights the importance of human compatibility. An AI trained rigorously in a two-player, zero-sum Diplomacy variant, even without explicit language, was destroyed when placed in a game with six humans because it couldn't understand human behavior. Humans react with anger or form an 'anti-AI bias' alliance against an agent that behaves suboptimally but rationally (e.g., taking advantage of an ally when not strictly necessary for common goal). Cicero learns to model and account for this human irrationality, recognizing that adhering to human social norms and building trust, even if seemingly 'suboptimal' in a purely rational sense, is crucial for success in human-AI cooperation.

ETHICAL CONSIDERATIONS AND THE FUTURE OF HUMAN-LIKE AI

The development of AIs like Cicero, capable of negotiation, persuasion, and even 'white lies' (filtered carefully), raises significant ethical questions. The ability of an AI to deceive, even if for strategic advantage, presents a dilemma for designers and society. Furthermore, human-like AI challenges cheat detection in online games, blurring the lines between human and machine play. However, this also opens up positive applications: creating AI training partners that play in specific human styles (e.g., like a Magnus Carlsen bot), populating virtual worlds with engaging NPCs, and potentially aiding real-world diplomatic negotiations. The research compels us to reflect on fundamental human concepts like trust, honesty, and even sentience as AI becomes more sophisticated.

SCALING INTELLIGENCE: DATA EFFICIENCY AND GENERAL KNOWLEDGE

Looking towards Artificial General Intelligence (AGI), Noam Brown identifies data inefficiency as a major current limitation. While AI excels with vast datasets (millions of games), humans learn complex tasks with far fewer examples. Overcoming this 'sample complexity' problem is a trillion-dollar question. One potential direction is enabling AIs to leverage a vast amount of general background knowledge about the world, rather than starting from scratch for each new domain. This mirrors how humans approach new games, bringing prior understanding to accelerate learning. Diplomacy, by integrating internet-scale language model pre-training with domain-specific fine-tuning, offers a blueprint for how AIs can leverage broad knowledge for specific, human-centric tasks, taking a significant step closer to real-world applicability beyond purely virtual environments.

Common Questions

No-Limit Texas Hold'em is the most popular poker variant where players can bet any amount, making stakes escalate quickly. Unlike Chess, it's an imperfect information game where players have hidden cards, leading to strategies like bluffing and requiring reasoning about probabilities rather than just optimal moves.

Topics

Mentioned in this video

Concepts
Game Theory

A theoretical framework discussed for understanding optimal strategy in games, particularly in the context of approximating Nash equilibrium in poker.

Nash Equilibrium

A game theory concept describing a state where no player can benefit by unilaterally changing their strategy, assuming other players' strategies remain unchanged. The AI systems aim to approximate this.

Depth-limited search

An algorithmic improvement that significantly reduces computational resources by only looking a few moves ahead instead of to the end of the game, making AI more scalable.

Counterfactual regret minimization

An algorithm used in self-play to converge to a Nash equilibrium in complex and imperfect information games like poker, by evaluating alternative actions.

Elo rating system

A method for calculating the relative skill levels of players in zero-sum games like chess and Go, and discussed as being harder to apply reliably in high-variance games like poker.

Monte Carlo Tree Search

A type of search algorithm crucial for game AIs like AlphaGo, involving simulating games to evaluate moves, but noted as less effective for imperfect information games like poker.

No-Limit Texas Hold'em

The most popular variant of poker, allowing players to bet any amount of chips, leading to rapid escalation of stakes and rewarding aggressive strategies.

Chain of Thought prompting

A technique in language models that allows them to show their reasoning process, described as similar to Monte Carlo rollouts in chess for limited improvements.

Neural Network

AI models that help generalize from similar situations in poker to decide on actions, especially critical given the vast number of decision points in the game.

Software & Apps
Libratus

The first AI system co-created by Noam Brown that achieved superhuman performance in two-player No-Limit Texas Hold'em, known for its balanced strategy.

AWS

Cloud computing platform mentioned to illustrate the significantly lower computational cost of training Pluribus compared to Libratus.

Stockfish

A powerful chess engine known for its inhuman playing style, contrasted with the goal of creating human-like AI opponents using Diplomacy techniques.

Deep Blue

IBM's chess-playing computer, highlighted for its heavy reliance on search and looking many moves ahead, crucial for its success against humans.

C++

The programming language used for Libratus, indicating the need for highly optimized and parallelized code for performance.

WebDiplomacy.net

An online platform where 50,000 games of Diplomacy with over 10 million natural language messages were collected to train the Cicero AI.

Pluribus

An AI system co-created by Noam Brown that achieved superhuman performance in six-player No-Limit Texas Hold'em, notable for its computational efficiency compared to Libratus.

AlphaZero

A later version of AlphaGo, used to illustrate how its Elo rating significantly drops without test-time search, emphasizing search's importance.

GPT-3

A large language model that, along with GPT-2, underscored the rapid advancements in AI, providing context for the ambitious goals of the Diplomacy AI project.

Cicero

An AI system co-created by Noam Brown that can strategically negotiate and out-diplomacy humans using natural language in the game of Diplomacy.

GPT-2

A large language model that, along with GPT-3, highlighted the rapid progress in AI and inspired the ambitious Diplomacy project.

AlphaGo

DeepMind's Go-playing AI, acknowledged for its landmark achievement but also the significant role of Monte Carlo tree search, without which it wouldn't be superhuman.

Visual Studio

The Integrated Development Environment (IDE) used by Noam Brown for coding Libratus.

More from Lex Fridman

View all 546 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free