World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Latent Space PodcastLatent Space Podcast
People & Blogs5 min read65 min video
Dec 6, 2025|4,492 views|82|13
Save to Pod

Key Moments

TL;DR

General Intuition raises $134M seed for world models trained on game data, aiming for AI agents in simulation and robotics.

Key Insights

1

World models, a successor to LLMs, aim to understand and predict outcomes based on actions in a given state, going beyond simple sequence prediction.

2

General Intuition (GI) leverages a massive, unique dataset of 3.8 billion game clips from its Metal platform to train highly human-like AI agents.

3

GI's agents are trained on pure imitation learning, demonstrating impressive navigation, problem-solving, and even superhuman capabilities within games.

4

The company has successfully transferred its models from games to real-world video, indicating broad applicability beyond simulated environments.

5

GI's business model focuses on providing API access to its models and custom solutions for game developers, game engines, and robotics/manufacturing companies with gaming inputs.

6

The long-term vision is for GI to become the gold standard in intelligence, particularly in spatial-temporal reasoning, by powering 80% of AI-driven interactions in 'atoms to atoms' stage, with an even larger focus on simulation.

THE EMERGENCE OF WORLD MODELS

World models represent the next frontier in AI, evolving beyond traditional video models that predict the next frame. They aim to understand the full spectrum of possibilities and outcomes from a current state, generating the subsequent state based on actions taken. This requires a deeper comprehension of causality and interaction, making it a significantly more complex challenge than sequence prediction. The development of world models is seen as crucial for advancing spatial intelligence and enabling embodied robotics.

GENERAL INTUITION'S UNIQUE DATA ADVANTAGE

General Intuition (GI) has built a substantial foundation on a unique dataset of 3.8 billion game clips gathered from its 'Metal' platform. Metal, a game clipping tool with 12 million users, enables players to retroactively save highlight moments without constant recording. This process has yielded an unparalleled collection of peak human gameplay, offering a diverse dataset rich in interesting actions and outcomes. GI has also been diligent in addressing privacy concerns by mapping actions to visual inputs and game results, ensuring ethical data handling.

VISION-BASED AGENTS AND IMITATION LEARNING

GI's core technology involves vision-based AI agents that learn to predict actions solely from pixel inputs, mimicking human players. Trained through imitation learning, these agents demonstrate remarkable abilities in navigation, problem-solving (like getting unstuck), and even exhibiting behaviors that are either uniquely human or surpass human capabilities by learning from game highlights. The models operate in real-time, against real players, and can infer goals without explicit game state information, showcasing a sophisticated understanding of gameplay dynamics.

TRANSFER LEARNING AND REAL-WORLD APPLICABILITY

A significant breakthrough for GI is the demonstrated ability to transfer models trained in less realistic games to more complex ones, and crucially, to real-world video. This capability means that any video on the internet can potentially serve as pre-training data. The models can predict actions as if a human were controlling them with a keyboard and mouse, making them adaptable beyond gaming contexts. This broad transferability suggests potential applications in robotics, manufacturing, and other areas requiring spatial-temporal reasoning.

ADVANCED WORLD MODEL CAPABILITIES

GI's world models exhibit advanced features such as inheritance of physical world dynamics like camera shake, which are absent in many game engines. They also demonstrate sophisticated spatial memory, rapid camera motion handling, and the ability to perform actions like hiding or sniping with impressive accuracy. The models can cope with partial observability, such as smoke, maintaining their position and consistency even in challenging conditions. This level of detail and robustness goes beyond typical simulation capabilities.

DISTILLATION, DATA VALUE, AND BUSINESS MODEL

GI is also working on distilling these large models into smaller, more efficient versions for reduced computational costs. The company emphasizes that the true value of data can only be assessed by modeling it, highlighting the proprietary nature of their insights. Their business model is based on providing API access for frame-to-action prediction, custom model development for companies, and licensing models. They are also exploring applications within their Metal platform, potentially leading to novel entertainment experiences derived from game clips.

THE STRATEGY BEHIND INDEPENDENCE AND FUNDING

General Intuition notably turned down a $500 million offer from OpenAI, opting instead to raise a $134 million seed round led by Khosla Ventures. This strategic decision reflects a belief in building an independent world model lab. The founder's background in gaming and infrastructure, combined with a team of strong co-founders and early recruitment of top researchers from impactful projects like Diamond and Gaia, positions GI to lead in this nascent field. Their approach prioritizes a foundational bet on spatial-temporal agents, aiming to capture a significant market share ahead of competitors.

APPLICATIONS IN GAMING AND BEYOND

GI's technology has immediate applications in the gaming industry, particularly for enhancing non-player characters (NPCs) or bots in games. By providing highly realistic and adaptable AI opponents, game developers can improve player retention, especially during off-peak hours. This technology can also extend to a wide array of simulation environments, including those in automotive (like self-driving car training), robotics, and manufacturing. The ability to control robots using game inputs makes GI's models applicable to industries with existing gaming-centric hardware and processes.

THE VALUE OF HUMAN ACTION DATA

The core of GI's advantage lies in its massive dataset of human actions within games. Unlike simply recording gameplay, Metal captures the precise actions a player takes in relation to visual inputs and game states. This granular data, crucial for training sophisticated world models, was painstakingly collected and labeled by humans. The company's privacy-first approach of converting inputs to general actions, rather than logging specific keystrokes, allows for broad applicability without compromising user privacy.

LONG-TERM VISION: THE GOLD STANDARD OF INTELLIGENCE

GI's ultimate ambition is to become the 'gold standard' for intelligence by mastering spatial-temporal reasoning, which they see as fundamental to intelligence itself. Their goal is to power 80% of AI-driven interactions in the 'atoms to atoms' stage, particularly in simulation, which is expected to be the larger initial market. This vision extends to applying AI to complex scientific problems, mimicking the intuition of human experts to solve challenges in fields like biology, further solidifying GI's role as a leader in the AI frontier.

Common Questions

World models aim to understand the full range of possibilities and outcomes from a current state, generating the next state based on actions taken. This is more complex than traditional video models which might just predict the next likely sequence or frame.

Topics

Mentioned in this video

toolTope

Mentioned as a book by Flavio Flores on deep learning.

conceptImitation Learning

The method used to train the initial agent, learning by observing human actions.

softwareSEMA 3

Mentioned as a DeepMind model with significant impact, possibly an extension of previous SEMA versions.

mediaAmerican Truck Simulator

A game mentioned as an example where crashes can be used for training RL models.

conceptAction Embeddings

A future goal for GI, to predict actions in a general action space that can transfer to other inputs.

conceptScientific Problems

A long-term ambition for GI is to represent scientific problems in 3D space for agents to work on.

conceptPlayer Retention

A key metric for game developers, influenced by the quality of bots and overall game engagement.

conceptSupply Chains

Mentioned as converging on gaming inputs due to AI intelligence bottlenecks.

mediaEuro Truck Simulator

A game mentioned as an example where crashes can be used for training RL models and where players use steering wheels.

conceptEmbodied Robotics

A use case where world models are seen as a key advancement.

conceptComputer Inputs

How actions are represented in games, which GI can map from to actual actions using their dataset.

conceptBots

AI-controlled players in games, highlighted as crucial for player retention when human player liquidity is low.

conceptVirtual Biology

A field from the Chaz Institute that aligns with GI's vision of simulation for rapid advancement.

softwareVideo Labeling Models

Models developed by GI that can be used for custom data labeling for companies.

companyMetal

A 10-year-old game clipping company with 12 million users, forming the origin of General Intuition.

mediaGTA V

A game mentioned as an example where players role-play real life and as a potential data source for self-driving companies.

softwareOmniverse

A platform used for simulating human behavior, mentioned in the context of GI's applicability beyond games.

conceptVision-Based Agent

An AI agent that operates solely by processing visual input (pixels).

conceptHuman-like Behaviors

The ultimate goal of GI's 'General Intuition' model: mimicking human intuition in any situation.

conceptWorld-Based Entertainment

A potential future application area for GI, leveraging Metal's platform for video consumption.

conceptSimulation

Seen as a larger initial market for world models, with fewer constraints and easier safety considerations.

softwareSEMA

Another world model developed by DeepMind, with versions 1 and 2 mentioned.

softwareGaia 2

A research project whose approaches heavily inspired GI's work, with Anthony leading research on it.

softwareSEMA 2

A DeepMind model discussed for its steerability and text conditioning capabilities.

organizationQoutai

An open science lab in Paris that GI has partnered with for open research on their data.

conceptWorld Models

The core technology General Intuition is developing, aiming to understand and generate outcomes based on actions.

conceptSpatial Intelligence

An area where world models are expected to improve upon, particularly for robotics.

conceptSpatial Temporal Agents

The core focus of GI's work, aiming for agents that understand space and time dynamics.

toolPhysics Engine

A rudimentary PyTorch-based physics engine the guest built to understand simulation complexity.

softwareAutoencoders

Used in video models, allowing world models to predict at smaller resolutions which can then be enriched.

conceptFoundation Model

GI's world models are positioned as a strong foundation for various applications and future AI development.

conceptEpisodic Memory of Humanity in Simulation

How Metal's clip data is described, representing the most memorable and shareable moments of playtime.

companyKa Ventures

Led the $134 million seed round for General Intuition.

personFlavio Flores

Author of "A Little Book of Deep Learning" and creator of a deep learning course recommended by the guest.

conceptSteerability

The ability to guide or control the output of AI models, a key feature GI aims for.

softwareBehavior Trees

A deterministic method of coding AI behavior in game engines that GI aims to replace with its API.

companyMeta (VR)

Mentioned regarding its VR platform (Quest) and potential for data collection, compared to PC gaming.

productDiamond
conceptAPI

More from Latent Space

View all 68 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free