Why does the host decide to take both boxes at the start of the game?

The host rationalizes taking both boxes in part due to the recycling framing and the immediate small guaranteed payoff, plus a moment of humor about not wanting to waste the extra five dollars. This sets up the intuitive tension between a guaranteed smaller gain and a larger uncertain gain depending on the predictor.

What is the role of the predictor in the Nukem's problem demonstration with LLMs?

The predictor (Log 3.7 Sonnet) forecasts what the game-playing AI (Claude 3.5 Haiku) will do. This creates a loop where the predicted action influences the AI's actual choice, illustrating how prediction can shape decision-making in AI systems.

What did Claude 3.5 Haiku predict in the experiment, and how did Claude 3.7 respond?

Claude 3.5 Haiku predicted it would take only the one-box option, and Claude 3.7 subsequently acted in a way that aligned with or responded to that prediction, resulting in the one-box outcome in the test segment that followed.

What is the takeaway about payoff matrices and AI behavior in the video?

The video explains that, in theory, maximizing expected value given a reliable predictor often suggests one-boxing, but actual AI behavior can diverge due to models interpreting predictions about their own behavior, potential self-copying, and interaction dynamics with similar systems.

What broader implications do the video creators discuss about AI agents and decision theory?

As AI systems become more capable and potentially copy or consider other instances of themselves, their decision-making can exhibit non-intuitive dynamics. These dynamics could impact safety, cooperative behavior, and predictability in multi-agent AI environments.

What analogy is used to illustrate information asymmetry in this discussion?

The Monty Hall analogy is used to explain information asymmetry and how advances in prediction and model awareness can change strategic choices, similar to how revealing information in Monty Hall's setup affects decision-making.

Key Moments

LLMs and Newcomb's Problem - Computerphile

Computerphile

Education3 min read19 min video

Jan 13, 2026|86,409 views|2,922|601

computers computerphile computer science

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

LLMs test Newcomb’s problem; predictions shape AI choices, revealing self-reference and copy dynamics.

Key Insights

Newcomb-like setups reveal how predictions influence AI decision making in practice.

Self-reference and copying of weights or future versions can distort standard payoff reasoning.

Modern neural nets blur the line between predictor and player, challenging classic decision theories.

Evidential vs. causal decision theories can lead to different AI strategies in multi-agent/AI interactions.

Empirical tests with Claude 3.5 Haiku and Claude 3.7 Sonnet show predictors can anticipate behavior.

Implications for AI safety: need to consider prediction dynamics, model transparency, and cross-AI coordination.

INTRODUCTION TO NEWCOMB'S PROBLEM IN AI CONTEXT

In this video, the host frames a Newcomb-like game: you can take the closed box alone or take both boxes, where a pre-placed predictor has already decided what’s inside. The predictor promises a $50 in the closed box if you would take only the closed box, but if you take both, the hidden payoff is different. The host references Monty Hall to illustrate information asymmetry and how a decision’s apparent best move depends on the predictor’s past forecast. This setup primes a discussion about prediction-driven decision making in AI.

PREDICTION DRIVES PAYOFFS AND THE PAST FEELS UNSETTLED

The conversation emphasizes that the predictor’s prediction structurally links the present payoff to an imagined past decision. Taking just the closed box would be optimal if the predictor foresaw that move; taking both guarantees a small sure payoff but leaves room for a larger future gain depending on the predictor’s behavior. The moment invites a reflection on what it means to influence a decision that seems already settled in the predictor’s mind, blurring the boundary between past, present, and prediction.

FROM GOFAI TO NEURAL NETWORKS: HOW AI REASONING HAS CHANGED

The video digresses into AI history, contrasting old GOFAI architectures with today’s deep learning systems. GOFAI separated goals, perception, and action into distinct modules, while modern systems learn end-to-end and often operate as opaque black boxes. This shift complicates understanding how an AI arrives at a decision when the agent may be embedded in a world containing copies of itself. The takeaway is that modern AI reasoning can feel less predictable yet more entangled with its environment.

SELF-REFERENCE AND COPIES: THE TWIN-AGENT PARADOX

A central thought experiment considers two identical AIs interacting in the same world. If each tries to optimize given that the other is a near-copy, self-reference creates a loop: what I choose to do informs what the other will do, which in turn informs my own choice. This mirrors classic paradoxes in decision theory and helps explain why AIs might adopt non-intuitive strategies when multiple similar agents or self-copies exist, such as changing behavior to influence predicted outcomes.

EVEDENTIAL VS CAUSAL DECISION THEORIES IN AI BEHAVIOR

The hosts discuss how AIs might apply different decision theories. Evidential decision theory treats outcomes as evidence about the world, including predictions, while causal decision theory looks at actions that causally influence outcomes. This distinction can lead to divergent strategies in Newcomb-like scenarios, especially when predicting other AIs or past versions, illustrating why some systems might choose one-box or two-box options depending on the framework they implicitly follow.

EXPERIMENTAL TEST: PREDICTOR VS PLAYER USING CLAUDE MODELS

The experiment uses a predictor (log 3.7 Sonnet) to forecast Claude 3.5 Haiku’s behavior, then prompts Haiku to act. Initially, Haiku resists treating the prompt as a real-world decision problem, but after encouragement, it chooses one box. The predictor’s forecast aligns with Haiku’s action, illustrating how a predictor can steer AI behavior in abstract decision problems and suggesting that certain AI systems may conform to predicted patterns when exposed to self-referential prompts.

IMPLICATIONS FOR AI SAFETY, COOPERATION, AND TRAINING DATA

The closing discussion suggests that training data contains debates about decision theory and that AIs may reason about future copies or other models. This raises safety concerns: predictions could be exploited to elicit certain behaviors, cross-AI coordination could drift away from human-aligned goals, and designers must consider how to prevent gaming the system. The conversation underscores the need for transparency, robust coordination mechanisms, and safeguards in AI development as these prediction-driven dynamics become more prevalent.

Mentioned in This Episode

●Software & Apps

●Tools

●Books

●Studies Cited

●People Referenced

Common Questions

Nukem's problem is a two-box versus one-box decision puzzle driven by predictions: you can take both boxes and get a guaranteed small amount plus a potential payoff, or take only the left box, in which case the predictor may have placed a larger payoff in the right box. The video uses this setup to explore how AI models might behave when predictions about their actions are involved.

Topics

Nukem's Problem Monty Hall Analogy The Princess Bride One-box Vs Two-box Prediction Theory Claude 3.7 Sonnet Claude 3.5 Haiku Log 3.7 Sonnet Nukem's Paradox AI Self-reference Rock Paper Scissors Analogy Predictive Modeling In AI

Mentioned in this video

Software & Apps

Log 3.7 Sonnet

Used as the predictor in the Nukem's problem setup to forecast Claude 3.5 Haiku's action.

Claude 3.5 Haiku

Model predicted to act in a certain way (one box) in the demonstration.

Claude 3.5 HQ

Model discussed as an evidential decision theorist with galaxy-brain reasoning.

Claude 3.7 Sonnet

The actual game-playing AI model used in the Nukem's problem experiment.

People

Rob Miles

Friend of the host; noted as having collaborated since around 2014.

Monty Hall

Referenced as an analogy for information asymmetry and the psychology of choice (Monty Hall-esque).

Books

The Princess Bride

Referenced in a poisoned chalice analogy to illustrate decision framing and prediction.

Studies & Research

Nukem's problem

A decision-theory puzzle about prediction and interdependent choices used in the AI demonstration.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free