World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
Key Moments
General Intuition raises $134M seed for world models trained on game data, aiming for AI agents in simulation and robotics.
Key Insights
World models, a successor to LLMs, aim to understand and predict outcomes based on actions in a given state, going beyond simple sequence prediction.
General Intuition (GI) leverages a massive, unique dataset of 3.8 billion game clips from its Metal platform to train highly human-like AI agents.
GI's agents are trained on pure imitation learning, demonstrating impressive navigation, problem-solving, and even superhuman capabilities within games.
The company has successfully transferred its models from games to real-world video, indicating broad applicability beyond simulated environments.
GI's business model focuses on providing API access to its models and custom solutions for game developers, game engines, and robotics/manufacturing companies with gaming inputs.
The long-term vision is for GI to become the gold standard in intelligence, particularly in spatial-temporal reasoning, by powering 80% of AI-driven interactions in 'atoms to atoms' stage, with an even larger focus on simulation.
THE EMERGENCE OF WORLD MODELS
World models represent the next frontier in AI, evolving beyond traditional video models that predict the next frame. They aim to understand the full spectrum of possibilities and outcomes from a current state, generating the subsequent state based on actions taken. This requires a deeper comprehension of causality and interaction, making it a significantly more complex challenge than sequence prediction. The development of world models is seen as crucial for advancing spatial intelligence and enabling embodied robotics.
GENERAL INTUITION'S UNIQUE DATA ADVANTAGE
General Intuition (GI) has built a substantial foundation on a unique dataset of 3.8 billion game clips gathered from its 'Metal' platform. Metal, a game clipping tool with 12 million users, enables players to retroactively save highlight moments without constant recording. This process has yielded an unparalleled collection of peak human gameplay, offering a diverse dataset rich in interesting actions and outcomes. GI has also been diligent in addressing privacy concerns by mapping actions to visual inputs and game results, ensuring ethical data handling.
VISION-BASED AGENTS AND IMITATION LEARNING
GI's core technology involves vision-based AI agents that learn to predict actions solely from pixel inputs, mimicking human players. Trained through imitation learning, these agents demonstrate remarkable abilities in navigation, problem-solving (like getting unstuck), and even exhibiting behaviors that are either uniquely human or surpass human capabilities by learning from game highlights. The models operate in real-time, against real players, and can infer goals without explicit game state information, showcasing a sophisticated understanding of gameplay dynamics.
TRANSFER LEARNING AND REAL-WORLD APPLICABILITY
A significant breakthrough for GI is the demonstrated ability to transfer models trained in less realistic games to more complex ones, and crucially, to real-world video. This capability means that any video on the internet can potentially serve as pre-training data. The models can predict actions as if a human were controlling them with a keyboard and mouse, making them adaptable beyond gaming contexts. This broad transferability suggests potential applications in robotics, manufacturing, and other areas requiring spatial-temporal reasoning.
ADVANCED WORLD MODEL CAPABILITIES
GI's world models exhibit advanced features such as inheritance of physical world dynamics like camera shake, which are absent in many game engines. They also demonstrate sophisticated spatial memory, rapid camera motion handling, and the ability to perform actions like hiding or sniping with impressive accuracy. The models can cope with partial observability, such as smoke, maintaining their position and consistency even in challenging conditions. This level of detail and robustness goes beyond typical simulation capabilities.
DISTILLATION, DATA VALUE, AND BUSINESS MODEL
GI is also working on distilling these large models into smaller, more efficient versions for reduced computational costs. The company emphasizes that the true value of data can only be assessed by modeling it, highlighting the proprietary nature of their insights. Their business model is based on providing API access for frame-to-action prediction, custom model development for companies, and licensing models. They are also exploring applications within their Metal platform, potentially leading to novel entertainment experiences derived from game clips.
THE STRATEGY BEHIND INDEPENDENCE AND FUNDING
General Intuition notably turned down a $500 million offer from OpenAI, opting instead to raise a $134 million seed round led by Khosla Ventures. This strategic decision reflects a belief in building an independent world model lab. The founder's background in gaming and infrastructure, combined with a team of strong co-founders and early recruitment of top researchers from impactful projects like Diamond and Gaia, positions GI to lead in this nascent field. Their approach prioritizes a foundational bet on spatial-temporal agents, aiming to capture a significant market share ahead of competitors.
APPLICATIONS IN GAMING AND BEYOND
GI's technology has immediate applications in the gaming industry, particularly for enhancing non-player characters (NPCs) or bots in games. By providing highly realistic and adaptable AI opponents, game developers can improve player retention, especially during off-peak hours. This technology can also extend to a wide array of simulation environments, including those in automotive (like self-driving car training), robotics, and manufacturing. The ability to control robots using game inputs makes GI's models applicable to industries with existing gaming-centric hardware and processes.
THE VALUE OF HUMAN ACTION DATA
The core of GI's advantage lies in its massive dataset of human actions within games. Unlike simply recording gameplay, Metal captures the precise actions a player takes in relation to visual inputs and game states. This granular data, crucial for training sophisticated world models, was painstakingly collected and labeled by humans. The company's privacy-first approach of converting inputs to general actions, rather than logging specific keystrokes, allows for broad applicability without compromising user privacy.
LONG-TERM VISION: THE GOLD STANDARD OF INTELLIGENCE
GI's ultimate ambition is to become the 'gold standard' for intelligence by mastering spatial-temporal reasoning, which they see as fundamental to intelligence itself. Their goal is to power 80% of AI-driven interactions in the 'atoms to atoms' stage, particularly in simulation, which is expected to be the larger initial market. This vision extends to applying AI to complex scientific problems, mimicking the intuition of human experts to solve challenges in fields like biology, further solidifying GI's role as a leader in the AI frontier.
Mentioned in This Episode
●Software & Apps
●Tools
●Companies
●Organizations
●Concepts
●People Referenced
Common Questions
World models aim to understand the full range of possibilities and outcomes from a current state, generating the next state based on actions taken. This is more complex than traditional video models which might just predict the next likely sequence or frame.
Topics
Mentioned in this video
Mentioned as a book by Flavio Flores on deep learning.
The method used to train the initial agent, learning by observing human actions.
Mentioned as a DeepMind model with significant impact, possibly an extension of previous SEMA versions.
A game mentioned as an example where crashes can be used for training RL models.
A future goal for GI, to predict actions in a general action space that can transfer to other inputs.
A long-term ambition for GI is to represent scientific problems in 3D space for agents to work on.
A key metric for game developers, influenced by the quality of bots and overall game engagement.
Mentioned as converging on gaming inputs due to AI intelligence bottlenecks.
A game mentioned as an example where crashes can be used for training RL models and where players use steering wheels.
A use case where world models are seen as a key advancement.
How actions are represented in games, which GI can map from to actual actions using their dataset.
AI-controlled players in games, highlighted as crucial for player retention when human player liquidity is low.
A field from the Chaz Institute that aligns with GI's vision of simulation for rapid advancement.
Models developed by GI that can be used for custom data labeling for companies.
A 10-year-old game clipping company with 12 million users, forming the origin of General Intuition.
A game mentioned as an example where players role-play real life and as a potential data source for self-driving companies.
A platform used for simulating human behavior, mentioned in the context of GI's applicability beyond games.
An AI agent that operates solely by processing visual input (pixels).
The ultimate goal of GI's 'General Intuition' model: mimicking human intuition in any situation.
A potential future application area for GI, leveraging Metal's platform for video consumption.
Seen as a larger initial market for world models, with fewer constraints and easier safety considerations.
Another world model developed by DeepMind, with versions 1 and 2 mentioned.
A research project whose approaches heavily inspired GI's work, with Anthony leading research on it.
A DeepMind model discussed for its steerability and text conditioning capabilities.
An open science lab in Paris that GI has partnered with for open research on their data.
The core technology General Intuition is developing, aiming to understand and generate outcomes based on actions.
An area where world models are expected to improve upon, particularly for robotics.
The core focus of GI's work, aiming for agents that understand space and time dynamics.
A rudimentary PyTorch-based physics engine the guest built to understand simulation complexity.
Used in video models, allowing world models to predict at smaller resolutions which can then be enriched.
GI's world models are positioned as a strong foundation for various applications and future AI development.
How Metal's clip data is described, representing the most memorable and shareable moments of playtime.
Led the $134 million seed round for General Intuition.
Author of "A Little Book of Deep Learning" and creator of a deep learning course recommended by the guest.
The ability to guide or control the output of AI models, a key feature GI aims for.
A deterministic method of coding AI behavior in game engines that GI aims to replace with its API.
Mentioned regarding its VR platform (Quest) and potential for data collection, compared to PC gaming.
More from Latent Space
View all 68 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free