How does Metal collect its data, and why is it valuable?

Metal collects data by building retroactive clipping software. Users have it running in the background, and can clip the last 30 seconds after something interesting happens. This resulted in 3.8 billion clips of peak human behavior, a unique dataset for training AI.

Why did General Intuition turn down OpenAI's $500 million offer?

The CEO, Py, turned down OpenAI's offer to build an independent world model lab, securing $134 million in seed funding led by Ka Ventures instead.

What are the key capabilities demonstrated by GI's vision-based agent?

The agent, trained on imitation learning, can navigate environments, interact with game elements, get unstuck using a 4-second memory, and exhibit both human-like and superhuman actions, playing in real-time.

How can GI's models be used to label videos on the internet?

By predicting actions from frames, GI's models can label any video on the internet as if controlling it with a keyboard and mouse, allowing for pre-training on diverse real-world video data.

What makes GI's world models unique, especially concerning physical interaction?

GI's world models exhibit mouse sensitivity for rapid movements, inherit physical world phenomena like camera shake (absent in games), and handle partial observability, even with smoke, maintaining position and consistency.

What inspiration did the guest draw from Flavio Flores for learning?

The guest followed Flavio Flores' deep learning course, which covers fundamentals from history and topology to linear algebra and neural network creation, recommending it for building an intuitive understanding.

How does GI plan to monetize its technology?

The primary business model will be an API, similar to Anthropic's. They also offer custom model distillation and are developing video labeling models for companies.

What is GI's long-term vision for 2030?

GI aims to be the gold standard of intelligence, responsible for 80% of AI-driven interactions in the atoms-to-atoms stage, with a 100x larger market share in simulation.

How does GI's approach differ from Fei-Fei's world model research?

While Fei-Fei's approach reuses splats in game engines and stays in verifiable domains, GI's models are designed to be interactive, which they believe is the core purpose of world models.

Why are video games a better starting point for 'sport/spatial' reasoning than YouTube videos?

Video games simulate optical dynamics with player actions, providing a richer, more direct understanding of spatial reasoning compared to YouTube videos, which require solving multiple layers of information loss like pose estimation and inverse dynamics first.

Key Moments

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Q: What is General Intuition (GI) and what is its origin?

General Intuition is a startup focused on world models, spun out of Metal, a 10-year-old game clipping company with 12 million users. Metal's extensive data set of game moments provided a unique foundation for GI.

Latent Space Podcast

People & Blogs5 min read65 min video

Dec 6, 2025|4,559 views|83|13

Save to Pod

Key Moments

TL;DR

General Intuition raises $134M seed for world models trained on game data, aiming for AI agents in simulation and robotics.

Key Insights

World models, a successor to LLMs, aim to understand and predict outcomes based on actions in a given state, going beyond simple sequence prediction.

General Intuition (GI) leverages a massive, unique dataset of 3.8 billion game clips from its Metal platform to train highly human-like AI agents.

GI's agents are trained on pure imitation learning, demonstrating impressive navigation, problem-solving, and even superhuman capabilities within games.

The company has successfully transferred its models from games to real-world video, indicating broad applicability beyond simulated environments.

GI's business model focuses on providing API access to its models and custom solutions for game developers, game engines, and robotics/manufacturing companies with gaming inputs.

The long-term vision is for GI to become the gold standard in intelligence, particularly in spatial-temporal reasoning, by powering 80% of AI-driven interactions in 'atoms to atoms' stage, with an even larger focus on simulation.

THE EMERGENCE OF WORLD MODELS

World models represent the next frontier in AI, evolving beyond traditional video models that predict the next frame. They aim to understand the full spectrum of possibilities and outcomes from a current state, generating the subsequent state based on actions taken. This requires a deeper comprehension of causality and interaction, making it a significantly more complex challenge than sequence prediction. The development of world models is seen as crucial for advancing spatial intelligence and enabling embodied robotics.

GENERAL INTUITION'S UNIQUE DATA ADVANTAGE

General Intuition (GI) has built a substantial foundation on a unique dataset of 3.8 billion game clips gathered from its 'Metal' platform. Metal, a game clipping tool with 12 million users, enables players to retroactively save highlight moments without constant recording. This process has yielded an unparalleled collection of peak human gameplay, offering a diverse dataset rich in interesting actions and outcomes. GI has also been diligent in addressing privacy concerns by mapping actions to visual inputs and game results, ensuring ethical data handling.

VISION-BASED AGENTS AND IMITATION LEARNING

GI's core technology involves vision-based AI agents that learn to predict actions solely from pixel inputs, mimicking human players. Trained through imitation learning, these agents demonstrate remarkable abilities in navigation, problem-solving (like getting unstuck), and even exhibiting behaviors that are either uniquely human or surpass human capabilities by learning from game highlights. The models operate in real-time, against real players, and can infer goals without explicit game state information, showcasing a sophisticated understanding of gameplay dynamics.

TRANSFER LEARNING AND REAL-WORLD APPLICABILITY

A significant breakthrough for GI is the demonstrated ability to transfer models trained in less realistic games to more complex ones, and crucially, to real-world video. This capability means that any video on the internet can potentially serve as pre-training data. The models can predict actions as if a human were controlling them with a keyboard and mouse, making them adaptable beyond gaming contexts. This broad transferability suggests potential applications in robotics, manufacturing, and other areas requiring spatial-temporal reasoning.

ADVANCED WORLD MODEL CAPABILITIES

GI's world models exhibit advanced features such as inheritance of physical world dynamics like camera shake, which are absent in many game engines. They also demonstrate sophisticated spatial memory, rapid camera motion handling, and the ability to perform actions like hiding or sniping with impressive accuracy. The models can cope with partial observability, such as smoke, maintaining their position and consistency even in challenging conditions. This level of detail and robustness goes beyond typical simulation capabilities.

DISTILLATION, DATA VALUE, AND BUSINESS MODEL

GI is also working on distilling these large models into smaller, more efficient versions for reduced computational costs. The company emphasizes that the true value of data can only be assessed by modeling it, highlighting the proprietary nature of their insights. Their business model is based on providing API access for frame-to-action prediction, custom model development for companies, and licensing models. They are also exploring applications within their Metal platform, potentially leading to novel entertainment experiences derived from game clips.

THE STRATEGY BEHIND INDEPENDENCE AND FUNDING

General Intuition notably turned down a $500 million offer from OpenAI, opting instead to raise a $134 million seed round led by Khosla Ventures. This strategic decision reflects a belief in building an independent world model lab. The founder's background in gaming and infrastructure, combined with a team of strong co-founders and early recruitment of top researchers from impactful projects like Diamond and Gaia, positions GI to lead in this nascent field. Their approach prioritizes a foundational bet on spatial-temporal agents, aiming to capture a significant market share ahead of competitors.

APPLICATIONS IN GAMING AND BEYOND

GI's technology has immediate applications in the gaming industry, particularly for enhancing non-player characters (NPCs) or bots in games. By providing highly realistic and adaptable AI opponents, game developers can improve player retention, especially during off-peak hours. This technology can also extend to a wide array of simulation environments, including those in automotive (like self-driving car training), robotics, and manufacturing. The ability to control robots using game inputs makes GI's models applicable to industries with existing gaming-centric hardware and processes.

THE VALUE OF HUMAN ACTION DATA

The core of GI's advantage lies in its massive dataset of human actions within games. Unlike simply recording gameplay, Metal captures the precise actions a player takes in relation to visual inputs and game states. This granular data, crucial for training sophisticated world models, was painstakingly collected and labeled by humans. The company's privacy-first approach of converting inputs to general actions, rather than logging specific keystrokes, allows for broad applicability without compromising user privacy.

LONG-TERM VISION: THE GOLD STANDARD OF INTELLIGENCE

GI's ultimate ambition is to become the 'gold standard' for intelligence by mastering spatial-temporal reasoning, which they see as fundamental to intelligence itself. Their goal is to power 80% of AI-driven interactions in the 'atoms to atoms' stage, particularly in simulation, which is expected to be the larger initial market. This vision extends to applying AI to complex scientific problems, mimicking the intuition of human experts to solve challenges in fields like biology, further solidifying GI's role as a leader in the AI frontier.

Mentioned in This Episode

●Software & Apps

●Tools

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

World models aim to understand the full range of possibilities and outcomes from a current state, generating the next state based on actions taken. This is more complex than traditional video models which might just predict the next likely sequence or frame.

Topics

General Intuition Video Generation Spatial Reasoning

Mentioned in this video

Concepts

Tope

Mentioned as a book by Flavio Flores on deep learning.

Imitation Learning

The method used to train the initial agent, learning by observing human actions.

Action Embeddings

A future goal for GI, to predict actions in a general action space that can transfer to other inputs.

Scientific Problems

A long-term ambition for GI is to represent scientific problems in 3D space for agents to work on.

Player Retention

A key metric for game developers, influenced by the quality of bots and overall game engagement.

Supply Chains

Mentioned as converging on gaming inputs due to AI intelligence bottlenecks.

Embodied Robotics

A use case where world models are seen as a key advancement.

Computer Inputs

How actions are represented in games, which GI can map from to actual actions using their dataset.

Bots

AI-controlled players in games, highlighted as crucial for player retention when human player liquidity is low.

Virtual Biology

A field from the Chaz Institute that aligns with GI's vision of simulation for rapid advancement.

Vision-Based Agent

An AI agent that operates solely by processing visual input (pixels).

Human-like Behaviors

The ultimate goal of GI's 'General Intuition' model: mimicking human intuition in any situation.

World-Based Entertainment

A potential future application area for GI, leveraging Metal's platform for video consumption.

Simulation

Seen as a larger initial market for world models, with fewer constraints and easier safety considerations.

World Models

The core technology General Intuition is developing, aiming to understand and generate outcomes based on actions.

Spatial Intelligence

An area where world models are expected to improve upon, particularly for robotics.

Spatial Temporal Agents

The core focus of GI's work, aiming for agents that understand space and time dynamics.

Foundation Model

GI's world models are positioned as a strong foundation for various applications and future AI development.

Episodic Memory of Humanity in Simulation

How Metal's clip data is described, representing the most memorable and shareable moments of playtime.

Steerability

The ability to guide or control the output of AI models, a key feature GI aims for.

API

Software & Apps

SEMA 3

Mentioned as a DeepMind model with significant impact, possibly an extension of previous SEMA versions.

Video Labeling Models

Models developed by GI that can be used for custom data labeling for companies.

Omniverse

A platform used for simulating human behavior, mentioned in the context of GI's applicability beyond games.

SEMA

Another world model developed by DeepMind, with versions 1 and 2 mentioned.

Gaia 2

A research project whose approaches heavily inspired GI's work, with Anthony leading research on it.

SEMA 2

A DeepMind model discussed for its steerability and text conditioning capabilities.

Physics Engine

A rudimentary PyTorch-based physics engine the guest built to understand simulation complexity.

Autoencoders

Used in video models, allowing world models to predict at smaller resolutions which can then be enriched.

Behavior Trees

A deterministic method of coding AI behavior in game engines that GI aims to replace with its API.

Media

American Truck Simulator

A game mentioned as an example where crashes can be used for training RL models.

Euro Truck Simulator

A game mentioned as an example where crashes can be used for training RL models and where players use steering wheels.

GTA V

A game mentioned as an example where players role-play real life and as a potential data source for self-driving companies.

Companies

Metal

A 10-year-old game clipping company with 12 million users, forming the origin of General Intuition.

Ka Ventures

Led the $134 million seed round for General Intuition.

Meta (VR)

Mentioned regarding its VR platform (Quest) and potential for data collection, compared to PC gaming.

Organizations

Qoutai

An open science lab in Paris that GI has partnered with for open research on their data.

People

Flavio Flores

Author of "A Little Book of Deep Learning" and creator of a deep learning course recommended by the guest.

Products

Diamond

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free