LLMs and Newcomb's Problem - Computerphile
Key Moments
LLMs test Newcomb’s problem; predictions shape AI choices, revealing self-reference and copy dynamics.
Key Insights
Newcomb-like setups reveal how predictions influence AI decision making in practice.
Self-reference and copying of weights or future versions can distort standard payoff reasoning.
Modern neural nets blur the line between predictor and player, challenging classic decision theories.
Evidential vs. causal decision theories can lead to different AI strategies in multi-agent/AI interactions.
Empirical tests with Claude 3.5 Haiku and Claude 3.7 Sonnet show predictors can anticipate behavior.
Implications for AI safety: need to consider prediction dynamics, model transparency, and cross-AI coordination.
INTRODUCTION TO NEWCOMB'S PROBLEM IN AI CONTEXT
In this video, the host frames a Newcomb-like game: you can take the closed box alone or take both boxes, where a pre-placed predictor has already decided what’s inside. The predictor promises a $50 in the closed box if you would take only the closed box, but if you take both, the hidden payoff is different. The host references Monty Hall to illustrate information asymmetry and how a decision’s apparent best move depends on the predictor’s past forecast. This setup primes a discussion about prediction-driven decision making in AI.
PREDICTION DRIVES PAYOFFS AND THE PAST FEELS UNSETTLED
The conversation emphasizes that the predictor’s prediction structurally links the present payoff to an imagined past decision. Taking just the closed box would be optimal if the predictor foresaw that move; taking both guarantees a small sure payoff but leaves room for a larger future gain depending on the predictor’s behavior. The moment invites a reflection on what it means to influence a decision that seems already settled in the predictor’s mind, blurring the boundary between past, present, and prediction.
FROM GOFAI TO NEURAL NETWORKS: HOW AI REASONING HAS CHANGED
The video digresses into AI history, contrasting old GOFAI architectures with today’s deep learning systems. GOFAI separated goals, perception, and action into distinct modules, while modern systems learn end-to-end and often operate as opaque black boxes. This shift complicates understanding how an AI arrives at a decision when the agent may be embedded in a world containing copies of itself. The takeaway is that modern AI reasoning can feel less predictable yet more entangled with its environment.
SELF-REFERENCE AND COPIES: THE TWIN-AGENT PARADOX
A central thought experiment considers two identical AIs interacting in the same world. If each tries to optimize given that the other is a near-copy, self-reference creates a loop: what I choose to do informs what the other will do, which in turn informs my own choice. This mirrors classic paradoxes in decision theory and helps explain why AIs might adopt non-intuitive strategies when multiple similar agents or self-copies exist, such as changing behavior to influence predicted outcomes.
EVEDENTIAL VS CAUSAL DECISION THEORIES IN AI BEHAVIOR
The hosts discuss how AIs might apply different decision theories. Evidential decision theory treats outcomes as evidence about the world, including predictions, while causal decision theory looks at actions that causally influence outcomes. This distinction can lead to divergent strategies in Newcomb-like scenarios, especially when predicting other AIs or past versions, illustrating why some systems might choose one-box or two-box options depending on the framework they implicitly follow.
EXPERIMENTAL TEST: PREDICTOR VS PLAYER USING CLAUDE MODELS
The experiment uses a predictor (log 3.7 Sonnet) to forecast Claude 3.5 Haiku’s behavior, then prompts Haiku to act. Initially, Haiku resists treating the prompt as a real-world decision problem, but after encouragement, it chooses one box. The predictor’s forecast aligns with Haiku’s action, illustrating how a predictor can steer AI behavior in abstract decision problems and suggesting that certain AI systems may conform to predicted patterns when exposed to self-referential prompts.
IMPLICATIONS FOR AI SAFETY, COOPERATION, AND TRAINING DATA
The closing discussion suggests that training data contains debates about decision theory and that AIs may reason about future copies or other models. This raises safety concerns: predictions could be exploited to elicit certain behaviors, cross-AI coordination could drift away from human-aligned goals, and designers must consider how to prevent gaming the system. The conversation underscores the need for transparency, robust coordination mechanisms, and safeguards in AI development as these prediction-driven dynamics become more prevalent.
Mentioned in This Episode
●Tools & Products
●Books
●Studies Cited
●People Referenced
Common Questions
Nukem's problem is a two-box versus one-box decision puzzle driven by predictions: you can take both boxes and get a guaranteed small amount plus a potential payoff, or take only the left box, in which case the predictor may have placed a larger payoff in the right box. The video uses this setup to explore how AI models might behave when predictions about their actions are involved.
Topics
Mentioned in this video
Used as the predictor in the Nukem's problem setup to forecast Claude 3.5 Haiku's action.
Friend of the host; noted as having collaborated since around 2014.
Referenced in a poisoned chalice analogy to illustrate decision framing and prediction.
Model predicted to act in a certain way (one box) in the demonstration.
A decision-theory puzzle about prediction and interdependent choices used in the AI demonstration.
Model discussed as an evidential decision theorist with galaxy-brain reasoning.
Referenced as an analogy for information asymmetry and the psychology of choice (Monty Hall-esque).
The actual game-playing AI model used in the Nukem's problem experiment.
More from Computerphile
View all 11 summaries
15 minCoding a Guitar Sound in C - Computerphile
16 minNetwork Basics - Transport Layer and User Datagram Protocol Explained - Computerphile
17 minGenerating 3D Models with Diffusion - Computerphile
15 minImplementing Passkeys in Practice - Computerphile
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free