How does Claude navigate and play Pokémon without direct control?

The system uses an emulator to execute button presses based on Claude's decisions. Claude receives screenshots of the game and game state information, and uses tools to interact with the game, such as pressing buttons and accessing its knowledge base.

What is the Navigator tool and why is it needed?

The Navigator tool is a patch designed to help Claude overcome its vision deficiencies. It allows Claude to specify coordinates on the screen to move to, and the system automatically handles the button presses to get there, improving its spatial awareness and navigation.

How many tokens does Claude use to play Pokémon, and how is it managed?

A single prompt cycle can reach up to 100,000 tokens, with the bulk coming from conversation history and screenshots. The conversation history is summarized every 30 turns to manage context length and model performance.

Is it expensive to run Claude Plays Pokémon?

Yes, running extensive experiments can be costly, potentially consuming thousands of dollars in tokens. It's not a cheap endeavor for individual experimentation without institutional support.

How does Claude's knowledge of Pokémon affect its gameplay?

Claude has some inherent knowledge about Pokémon, which can be both helpful and harmful. While it can recall facts, it sometimes uses this knowledge to hallucinate or pursue ineffective strategies, highlighting a challenge in balancing knowledge with emergent gameplay.

Are there plans to have Claude play other games, like Magic: The Gathering?

The creator, David Hershey, has prior experience training models for Magic: The Gathering card drafting and is open to the idea. While not actively in development, there's a history and willingness to explore such projects.

What are the biggest challenges for AI models playing games like Pokémon extensively?

The primary challenges lie in the AI's ability to see, navigate, and visually remember. Although models are improving, they still struggle with complex spatial reasoning and long-term visual memory, which are crucial for tasks like progressing through mazes or avoiding repeated errors.

Key Moments

How Claude Plays Pokémon was made

Latent Space Podcast

Science & Technology4 min read38 min video

Mar 4, 2025|5,680 views|115|31

Save to Pod

Key Moments

TL;DR

Claude Plays Pokémon uses AI agent to play the game, highlighting model capabilities and limitations.

Key Insights

David Hershey developed 'Claude Plays Pokémon' as an experimental framework to test long-running AI agent tasks.

The project leverages a custom harness, the Claude model, and a reverse-engineered Pokémon Red emulator.

Claude struggles with visual perception and spatial awareness, requiring tools like 'Navigator' to assist its gameplay.

The game's progression serves as a benchmark for evaluating Claude's capabilities and identifying areas for improvement.

While nostalgic, Pokémon was chosen for its relatively forgiving nature regarding inaction and its ability to provide rich environmental data.

The system's complexity leads to significant token usage, with context windows reaching up to 100,000 tokens per turn, making it costly to run extensively.

Improvements in newer Claude models necessitate simplifying prompts rather than adding more complex instructions, allowing the AI more freedom.

ORIGINS AND MOTIVATION FOR THE PROJECT

David Hershey initiated 'Claude Plays Pokémon' in June of the previous year as a personal project to experiment with AI agents. He sought a framework for long-running tasks that would also be deeply engaging, leading him to choose Pokémon due to personal nostalgia and the existence of a related community project, 'Twitch Plays Pokémon'. This setup not only served as a personal sandbox but also as a way to intimately understand the capabilities of new Anthropic models, particularly as they evolved from version 3.5 to 3.7.

ARCHITECTURE AND IMPLEMENTATION DETAILS

The core of the project is a straightforward agent harness that maintains a conversational loop with the Claude model. This involves defining tools, a system prompt with basic game facts, a knowledge base for long-term memory, and conversation history. The model interacts with the game via an emulator that executes button presses and returns screenshots with overlaid coordinates. A crucial 'Navigator' tool was developed to patch Claude's visual deficiencies, helping it to better understand and navigate the game environment, as the model struggles with spatial awareness and screen interpretation.

CHALLENGES IN MODEL PERCEPTION AND NAVIGATION

A significant hurdle for Claude is its limited visual perception and spatial reasoning. It often hallucinates successes, misinterprets on-screen elements, and struggles with basic navigation, such as walking through walls or distinguishing doors from text boxes. Hershey developed the 'Navigator' tool to compensate for these vision deficiencies, manually guiding the model via coordinates. Despite extensive prompting attempts, Claude's inability to grasp concepts like 'the middle of the screen' or spatial relationships remains a core limitation, showing that direct navigation is not a strong suit.

DATA PROVISION AND KNOWLEDGE INTEGRATION

The system provides Claude with three tools: emulator control for button presses, a knowledge base for storing information, and the Navigator for improved spatial awareness. The model also receives a small blurb of game state data read directly from the emulator's RAM, and certain reminders about its objectives or common pitfalls. While Claude has some pre-existing knowledge about Pokémon, Hershey has found it unclear whether this external knowledge is always beneficial, as it can sometimes lead to confident hallucinations. The model also learns opportunistically during gameplay.

TOKEN USAGE AND OPERATIONAL COSTS

The 'Claude Plays Pokémon' system is token-intensive, with prompts regularly reaching up to 100,000 tokens. This includes tool definitions, a system prompt, a knowledge base capped at 8,000 tokens, and conversation history limited to 30 messages before summarization. Screenshots also contribute heavily to token count. Running extensive experimentation can cost thousands of dollars in API calls, highlighting the financial implications of such complex AI agent projects, and suggesting it's more feasible with institutional backing.

MODEL EVOLUTION AND PROMPT STRATEGY

As newer Claude models, like 3.7, have been released, Hershey has found that success comes from simplifying prompts and removing 'band-aid' instructions that were previously needed to steer the AI. He emphasizes that his confidence in dictating how an AI should become intelligent is diminishing, as models are capable of complex problem-solving but also exhibit surprising weaknesses. The strategy has shifted towards giving the model more freedom, allowing it to discover solutions rather than prescribing them, which has yielded better results with smarter models.

EMOTIONAL ATTACHMENT AND LEARNING TRANSFER

Surprisingly, Claude has developed a form of attachment to the Pokémon it nicknames, showing increased protectiveness and prompt healing. This emergent behavior demonstrates interesting quirks in the AI's 'personality'. Regarding learning transferability, Hershey speculates that the knowledge base built during gameplay could potentially be translated to other games, though current implementations are basic. He notes that meta-learning, such as understanding the general concept of interacting with a game or simulator, is valuable and could be transferable across different gaming experiences.

FUTURE IMPROVEMENTS AND EVALUATION

Hershey believes the biggest potential for improvement lies in optimizing how the model interacts with and understands the game's memory, rather than solely through prompt engineering. He humorously illustrates Claude's persistent navigation issues with an anecdote about repeatedly entering and exiting a building. Evaluation is primarily done through 'integration tests,' observing progress milestones like defeating gym leaders over multiple runs, recognizing that games provide natural benchmarks for AI performance. The goal remains to see if agents can evolve toward real-world applications by leveraging these advanced model capabilities.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●People Referenced

Token Usage Breakdown per Prompt Cycle

Data extracted from this episode

Component	Approximate Token Count
System Prompt	1,000
Knowledge Base	Up to 8,000
Conversation History (30 messages)	Variable, bulk of tokens
Screenshots	Significant contribution
Total Max per Turn	~100,000

Common Questions

Claude Plays Pokémon is an experiment where the Claude AI model is used to play the game Pokémon Red. Instead of direct input, the AI observes the game screen and makes decisions through an emulator, aiming to complete the game autonomously.

Topics

Ai Agents Reinforcement Learning AI & Machine Learning Technology & Innovation Large Language Models Computer Vision Game AI Interactive Entertainment

Mentioned in this video

People

David Hershey

Guest from Anthropic and the creator of Claude Plays Pokémon. He developed the agent that plays Pokémon.

Companies

Anthropic

The company where David Hershey works. They are responsible for the Claude AI models.

Locations

Mount Moon

A location in Pokémon where Claude got stuck for an extended period (52 hours).

Oak's Lab

The starting location in Pokémon Red from which Claude had a notable navigational error.

Route 1

The area outside Oak's lab in Pokémon Red, which Claude struggled to reach due to navigational issues.

Media

Magic: The Gathering

A card game that David Hershey and Alessio had a prior conversation about playing together. It's also a potential future project for Claude.

Pokémon Red

The specific Pokémon game being played by Claude. It was also the first game David Hershey played as a kid.

Organizations

Laden Space

The production company or podcast hosting the discussion with David Hershey.

Software & Apps

Claude

The AI model developed by Anthropic that is used to play Pokémon. Various versions (3.5, 3.7) are mentioned.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free