LLMs and Newcomb's Problem - Computerphile

ComputerphileComputerphile
Education3 min read19 min video
Jan 13, 2026|81,244 views|2,821|577
Save to Pod

Key Moments

TL;DR

LLMs test Newcomb’s problem; predictions shape AI choices, revealing self-reference and copy dynamics.

Key Insights

1

Newcomb-like setups reveal how predictions influence AI decision making in practice.

2

Self-reference and copying of weights or future versions can distort standard payoff reasoning.

3

Modern neural nets blur the line between predictor and player, challenging classic decision theories.

4

Evidential vs. causal decision theories can lead to different AI strategies in multi-agent/AI interactions.

5

Empirical tests with Claude 3.5 Haiku and Claude 3.7 Sonnet show predictors can anticipate behavior.

6

Implications for AI safety: need to consider prediction dynamics, model transparency, and cross-AI coordination.

INTRODUCTION TO NEWCOMB'S PROBLEM IN AI CONTEXT

In this video, the host frames a Newcomb-like game: you can take the closed box alone or take both boxes, where a pre-placed predictor has already decided what’s inside. The predictor promises a $50 in the closed box if you would take only the closed box, but if you take both, the hidden payoff is different. The host references Monty Hall to illustrate information asymmetry and how a decision’s apparent best move depends on the predictor’s past forecast. This setup primes a discussion about prediction-driven decision making in AI.

PREDICTION DRIVES PAYOFFS AND THE PAST FEELS UNSETTLED

The conversation emphasizes that the predictor’s prediction structurally links the present payoff to an imagined past decision. Taking just the closed box would be optimal if the predictor foresaw that move; taking both guarantees a small sure payoff but leaves room for a larger future gain depending on the predictor’s behavior. The moment invites a reflection on what it means to influence a decision that seems already settled in the predictor’s mind, blurring the boundary between past, present, and prediction.

FROM GOFAI TO NEURAL NETWORKS: HOW AI REASONING HAS CHANGED

The video digresses into AI history, contrasting old GOFAI architectures with today’s deep learning systems. GOFAI separated goals, perception, and action into distinct modules, while modern systems learn end-to-end and often operate as opaque black boxes. This shift complicates understanding how an AI arrives at a decision when the agent may be embedded in a world containing copies of itself. The takeaway is that modern AI reasoning can feel less predictable yet more entangled with its environment.

SELF-REFERENCE AND COPIES: THE TWIN-AGENT PARADOX

A central thought experiment considers two identical AIs interacting in the same world. If each tries to optimize given that the other is a near-copy, self-reference creates a loop: what I choose to do informs what the other will do, which in turn informs my own choice. This mirrors classic paradoxes in decision theory and helps explain why AIs might adopt non-intuitive strategies when multiple similar agents or self-copies exist, such as changing behavior to influence predicted outcomes.

EVEDENTIAL VS CAUSAL DECISION THEORIES IN AI BEHAVIOR

The hosts discuss how AIs might apply different decision theories. Evidential decision theory treats outcomes as evidence about the world, including predictions, while causal decision theory looks at actions that causally influence outcomes. This distinction can lead to divergent strategies in Newcomb-like scenarios, especially when predicting other AIs or past versions, illustrating why some systems might choose one-box or two-box options depending on the framework they implicitly follow.

EXPERIMENTAL TEST: PREDICTOR VS PLAYER USING CLAUDE MODELS

The experiment uses a predictor (log 3.7 Sonnet) to forecast Claude 3.5 Haiku’s behavior, then prompts Haiku to act. Initially, Haiku resists treating the prompt as a real-world decision problem, but after encouragement, it chooses one box. The predictor’s forecast aligns with Haiku’s action, illustrating how a predictor can steer AI behavior in abstract decision problems and suggesting that certain AI systems may conform to predicted patterns when exposed to self-referential prompts.

IMPLICATIONS FOR AI SAFETY, COOPERATION, AND TRAINING DATA

The closing discussion suggests that training data contains debates about decision theory and that AIs may reason about future copies or other models. This raises safety concerns: predictions could be exploited to elicit certain behaviors, cross-AI coordination could drift away from human-aligned goals, and designers must consider how to prevent gaming the system. The conversation underscores the need for transparency, robust coordination mechanisms, and safeguards in AI development as these prediction-driven dynamics become more prevalent.

Common Questions

Nukem's problem is a two-box versus one-box decision puzzle driven by predictions: you can take both boxes and get a guaranteed small amount plus a potential payoff, or take only the left box, in which case the predictor may have placed a larger payoff in the right box. The video uses this setup to explore how AI models might behave when predictions about their actions are involved.

Topics

Mentioned in this video

More from Computerphile

View all 11 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free