Key Moments

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Latent Space PodcastLatent Space Podcast
People & Blogs5 min read65 min video
Dec 6, 2025|4,559 views|83|13
Save to Pod
TL;DR

General Intuition raises $134M seed for world models trained on game data, aiming for AI agents in simulation and robotics.

Key Insights

1

World models, a successor to LLMs, aim to understand and predict outcomes based on actions in a given state, going beyond simple sequence prediction.

2

General Intuition (GI) leverages a massive, unique dataset of 3.8 billion game clips from its Metal platform to train highly human-like AI agents.

3

GI's agents are trained on pure imitation learning, demonstrating impressive navigation, problem-solving, and even superhuman capabilities within games.

4

The company has successfully transferred its models from games to real-world video, indicating broad applicability beyond simulated environments.

5

GI's business model focuses on providing API access to its models and custom solutions for game developers, game engines, and robotics/manufacturing companies with gaming inputs.

6

The long-term vision is for GI to become the gold standard in intelligence, particularly in spatial-temporal reasoning, by powering 80% of AI-driven interactions in 'atoms to atoms' stage, with an even larger focus on simulation.

THE EMERGENCE OF WORLD MODELS

World models represent the next frontier in AI, evolving beyond traditional video models that predict the next frame. They aim to understand the full spectrum of possibilities and outcomes from a current state, generating the subsequent state based on actions taken. This requires a deeper comprehension of causality and interaction, making it a significantly more complex challenge than sequence prediction. The development of world models is seen as crucial for advancing spatial intelligence and enabling embodied robotics.

GENERAL INTUITION'S UNIQUE DATA ADVANTAGE

General Intuition (GI) has built a substantial foundation on a unique dataset of 3.8 billion game clips gathered from its 'Metal' platform. Metal, a game clipping tool with 12 million users, enables players to retroactively save highlight moments without constant recording. This process has yielded an unparalleled collection of peak human gameplay, offering a diverse dataset rich in interesting actions and outcomes. GI has also been diligent in addressing privacy concerns by mapping actions to visual inputs and game results, ensuring ethical data handling.

VISION-BASED AGENTS AND IMITATION LEARNING

GI's core technology involves vision-based AI agents that learn to predict actions solely from pixel inputs, mimicking human players. Trained through imitation learning, these agents demonstrate remarkable abilities in navigation, problem-solving (like getting unstuck), and even exhibiting behaviors that are either uniquely human or surpass human capabilities by learning from game highlights. The models operate in real-time, against real players, and can infer goals without explicit game state information, showcasing a sophisticated understanding of gameplay dynamics.

TRANSFER LEARNING AND REAL-WORLD APPLICABILITY

A significant breakthrough for GI is the demonstrated ability to transfer models trained in less realistic games to more complex ones, and crucially, to real-world video. This capability means that any video on the internet can potentially serve as pre-training data. The models can predict actions as if a human were controlling them with a keyboard and mouse, making them adaptable beyond gaming contexts. This broad transferability suggests potential applications in robotics, manufacturing, and other areas requiring spatial-temporal reasoning.

ADVANCED WORLD MODEL CAPABILITIES

GI's world models exhibit advanced features such as inheritance of physical world dynamics like camera shake, which are absent in many game engines. They also demonstrate sophisticated spatial memory, rapid camera motion handling, and the ability to perform actions like hiding or sniping with impressive accuracy. The models can cope with partial observability, such as smoke, maintaining their position and consistency even in challenging conditions. This level of detail and robustness goes beyond typical simulation capabilities.

DISTILLATION, DATA VALUE, AND BUSINESS MODEL

GI is also working on distilling these large models into smaller, more efficient versions for reduced computational costs. The company emphasizes that the true value of data can only be assessed by modeling it, highlighting the proprietary nature of their insights. Their business model is based on providing API access for frame-to-action prediction, custom model development for companies, and licensing models. They are also exploring applications within their Metal platform, potentially leading to novel entertainment experiences derived from game clips.

THE STRATEGY BEHIND INDEPENDENCE AND FUNDING

General Intuition notably turned down a $500 million offer from OpenAI, opting instead to raise a $134 million seed round led by Khosla Ventures. This strategic decision reflects a belief in building an independent world model lab. The founder's background in gaming and infrastructure, combined with a team of strong co-founders and early recruitment of top researchers from impactful projects like Diamond and Gaia, positions GI to lead in this nascent field. Their approach prioritizes a foundational bet on spatial-temporal agents, aiming to capture a significant market share ahead of competitors.

APPLICATIONS IN GAMING AND BEYOND

GI's technology has immediate applications in the gaming industry, particularly for enhancing non-player characters (NPCs) or bots in games. By providing highly realistic and adaptable AI opponents, game developers can improve player retention, especially during off-peak hours. This technology can also extend to a wide array of simulation environments, including those in automotive (like self-driving car training), robotics, and manufacturing. The ability to control robots using game inputs makes GI's models applicable to industries with existing gaming-centric hardware and processes.

THE VALUE OF HUMAN ACTION DATA

The core of GI's advantage lies in its massive dataset of human actions within games. Unlike simply recording gameplay, Metal captures the precise actions a player takes in relation to visual inputs and game states. This granular data, crucial for training sophisticated world models, was painstakingly collected and labeled by humans. The company's privacy-first approach of converting inputs to general actions, rather than logging specific keystrokes, allows for broad applicability without compromising user privacy.

LONG-TERM VISION: THE GOLD STANDARD OF INTELLIGENCE

GI's ultimate ambition is to become the 'gold standard' for intelligence by mastering spatial-temporal reasoning, which they see as fundamental to intelligence itself. Their goal is to power 80% of AI-driven interactions in the 'atoms to atoms' stage, particularly in simulation, which is expected to be the larger initial market. This vision extends to applying AI to complex scientific problems, mimicking the intuition of human experts to solve challenges in fields like biology, further solidifying GI's role as a leader in the AI frontier.

Common Questions

World models aim to understand the full range of possibilities and outcomes from a current state, generating the next state based on actions taken. This is more complex than traditional video models which might just predict the next likely sequence or frame.

Topics

Mentioned in this video

Concepts
Tope

Mentioned as a book by Flavio Flores on deep learning.

Imitation Learning

The method used to train the initial agent, learning by observing human actions.

Action Embeddings

A future goal for GI, to predict actions in a general action space that can transfer to other inputs.

Scientific Problems

A long-term ambition for GI is to represent scientific problems in 3D space for agents to work on.

Player Retention

A key metric for game developers, influenced by the quality of bots and overall game engagement.

Supply Chains

Mentioned as converging on gaming inputs due to AI intelligence bottlenecks.

Embodied Robotics

A use case where world models are seen as a key advancement.

Computer Inputs

How actions are represented in games, which GI can map from to actual actions using their dataset.

Bots

AI-controlled players in games, highlighted as crucial for player retention when human player liquidity is low.

Virtual Biology

A field from the Chaz Institute that aligns with GI's vision of simulation for rapid advancement.

Vision-Based Agent

An AI agent that operates solely by processing visual input (pixels).

Human-like Behaviors

The ultimate goal of GI's 'General Intuition' model: mimicking human intuition in any situation.

World-Based Entertainment

A potential future application area for GI, leveraging Metal's platform for video consumption.

Simulation

Seen as a larger initial market for world models, with fewer constraints and easier safety considerations.

World Models

The core technology General Intuition is developing, aiming to understand and generate outcomes based on actions.

Spatial Intelligence

An area where world models are expected to improve upon, particularly for robotics.

Spatial Temporal Agents

The core focus of GI's work, aiming for agents that understand space and time dynamics.

Foundation Model

GI's world models are positioned as a strong foundation for various applications and future AI development.

Episodic Memory of Humanity in Simulation

How Metal's clip data is described, representing the most memorable and shareable moments of playtime.

Steerability

The ability to guide or control the output of AI models, a key feature GI aims for.

API

More from Latent Space

View all 201 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free