How can AI systems learn to take actions in the real world?

AI systems need to move beyond supervised learning and learn from experience through trial and error, much like humans do. This requires training in simulated environments or an 'internet of environments'.

What is 'regret' in reinforcement learning?

Regret quantifies how much worse an agent performs compared to an optimal agent in a given environment. High regret suggests there's significant potential for learning and improvement.

Why did traditional regret approximation methods fail in new environments?

Standard methods based on regret approximation broke down when tested in environments slightly outside their training distribution. They did not accurately capture the notion of learnability.

What is the 'No Regrets' approach and why is learnability important?

The 'No Regrets' approach shifts focus from optimizing regret to optimizing 'learnability' – environments where an agent can sometimes succeed but not always. This proxy for effective learning led to better generalization.

Why has deep learning been more successful than deep reinforcement learning?

Deep learning's success was amplified by its efficient use of GPUs. Deep reinforcement learning faced bottlenecks due to the separation of CPU-based environments and GPU-based policy training.

How does GPU-accelerated RL (RL at the Hyperscale) improve training?

By running the environment on the GPU alongside the policy and training loop, data exchange bottlenecks are removed, leading to massive speedups (100x to 10,000x) and making extensive evaluation feasible.

What is 'Kinetics' and what does it offer for AI training?

Kinetics is a system with a 2D physics engine, task editor, and RL training code. It provides a vastly diverse set of tasks, enabling agents to train on random samples and generalize to designed tasks.

What are agentic foundation models?

These are AI models, akin to LLMs, but pre-trained for decision-making and acting in the world. They show potential for zero-shot improvements and faster fine-tuning on new tasks.

Key Moments

No Regrets - What Happens to AI Beyond Generative? - Computerphile

Computerphile

Education4 min read28 min video

Feb 24, 2025|192,647 views|4,772|442

computers computerphile computer science

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Moving beyond generative AI's text focus to agentic AI using simulated environments and trial-and-error learning.

Key Insights

Generative AI excels at text prediction but is ill-suited for real-world actions and decisions requiring trial-and-error.

Future AI needs to learn from experience in simulated environments, leveraging increasing computational power (compute-only scaling) rather than limited human data.

Adversarial or diverse task distributions are crucial for training robust agents, but current regret approximation methods often fail outside narrow domains.

Optimizing directly for 'learnability' (where agents sometimes succeed but not always) proves more effective than optimizing for regret approximation.

Reinforcement Learning training is significantly bottlenecked by the CPU-GPU architecture; placing environments on the GPU ('RL at hyperscale') offers massive speedups.

The 'Kinetics' simulator provides a versatile 2D physics engine for creating diverse tasks, enabling agentic foundation models analogous to LLMs.

THE LIMITATIONS OF GENERATIVE AI

Current generative AI models, primarily trained via self-supervised learning to predict future data, demonstrate remarkable proficiency in text-based tasks like question answering and chatbots. However, these systems are fundamentally ill-equipped for real-world applications that demand active decision-making, long-term planning, and learning through trial and error. Their reliance on predicting from existing data corpora limits their ability to adapt and learn from novel experiences, necessitating a shift towards AI capable of interacting with and learning from the environment.

TRANSITIONING TO TRIAL-AND-ERROR LEARNING

The next frontier for AI involves developing systems that can learn through active engagement with their surroundings, much like humans do. This requires AI agents that can perform actions, experience the consequences, and refine their strategies based on that feedback. The challenge lies in the inherent risks associated with trial-and-error learning in the real world, which can lead to unpredictable and potentially undesirable outcomes. Consequently, training these agents effectively and safely will likely occur within simulated environments.

THE ROLE OF SIMULATED ENVIRONMENTS AND COMPUTE-ONLY SCALING

To overcome the limitations of real-world trial-and-error and the scarcity of human data, future AI development will heavily rely on training within virtual, simulated environments. This approach leverages the continuous acceleration of computing power, a concept termed 'compute-only scaling.' As computers become exponentially faster while the growth of human data remains limited, simulated worlds offer an scalable platform for AI agents to perform an immense number of trials, learn complex behaviors, and develop sophisticated decision-making capabilities.

ADDRESSING DISTRIBUTION SHIFTS AND REGRET

A significant challenge in training AI agents for real-world tasks is ensuring their robustness to unseen scenarios. Agents trained in a specific distribution of simulated environments must generalize effectively to different, potentially novel, tasks encountered in reality. 'Regret' measures the difference between an agent's performance and the optimal possible performance in an environment. While methods have aimed to approximate regret to guide training, research has shown these approximations often fail to capture true learnability when agents encounter out-of-distribution tasks, particularly in more complex, realistic simulations.

OPTIMIZING FOR LEARNABILITY OVER REGRET

Empirical findings suggest that directly optimizing for 'learnability'—a metric reflecting tasks where an agent has a moderate chance of success and failure—is more effective than relying on regret approximations. By measuring learnability as a proxy for tasks that are challenging but not insurmountable, researchers found that optimizing this metric led to agents that generalized better to novel, human-designed tasks. This highlights the importance of revisiting fundamental assumptions and grounding AI development in first principles when standard methods prove inadequate.

ACCELERATING REINFORCEMENT LEARNING WITH HYPERSCALE

Traditional reinforcement learning (RL) architectures, separating CPU-based environments from GPU-based policies, create significant communication bottlenecks, limiting training efficiency. The 'RL at hyperscale' initiative addresses this by moving the environment execution onto the GPU. This integration eliminates inter-processor communication overhead, allowing GPUs to be fully utilized for number crunching and enabling massive speedups, ranging from 100x to 10,000x. This advancement removes computational barriers, making extensive evaluation across diverse environments more feasible.

THE KINETICS SIMULATOR: A UNIVERSE OF TASKS

To combat the lack of diverse training tasks, the Kinetics simulator has been developed. It offers an end-to-end, GPU-accelerated system featuring an editor for task creation, a physics engine, and RL training code. Kinetics allows for the simulation of a vast array of 2D physics-based tasks, where the common goal involves manipulating objects (e.g., bringing a green object into contact with a blue one) while avoiding obstacles. This versatility enables the creation of tasks resembling classic games and robotic challenges, all within a unified, parameterizable framework.

TOWARDS AGENTIC FOUNDATION MODELS

Kinetics, combined with learnability-based curricula, has demonstrated success in training generalist agents. These agents exhibit significant zero-shot improvements on unseen, human-designed tasks and more efficient fine-tuning compared to training from scratch, mirroring the paradigm shift brought by large language models (LLMs). This work represents a crucial step towards developing 'agentic foundation models'—AI systems pre-trained for decision-making and action, rather than solely for next-token prediction, akin to the early stages of LLM development.

SCALING TO 3D AND BEYOND

The principles demonstrated in the 2D Kinetics simulator, such as GPU acceleration and learnability-based curriculum design, are conceptually transferable to more complex 3D environments. While implementing full 3D simulations is computationally intensive, the groundwork laid by these advancements provides a scalable foundation. The ultimate goal is to create AI systems capable of robust and transferable training in increasingly rich and realistic simulated worlds, paving the way for agents that can navigate and act effectively in the complexities of the real world, potentially mirroring the evolution seen in computer graphics from 2D to 3D.

Mentioned in This Episode

●Software & Apps

●Tools

●Organizations

●Concepts

Common Questions

Generative AI models excel at text-based tasks like Q&A and chatbots, but are not well-suited for real-world actions that require trial-and-error learning, long-term planning, or complex reasoning.

Topics

Compute-only Scaling Learnability Robotics Simulation AI Generalization

Mentioned in this video

Concepts

Amazon warehouse

Mentioned as a real-world scenario where multiple independent robots with sensors would need to navigate complex, occluded environments.

GPU (Graphics Processing Unit)

Hardware well-suited for deep learning's matrix-vector operations, which significantly boosted its success. Also used for accelerating RL training.

Optimal Policy

The policy that yields the highest expected reward for a given task in reinforcement learning.

Internet of Text

Refers to the vast amount of text data available online, which has been crucial for training text-based AI models.

2D Environment with Robots and Lidar

A simulated 2D environment designed to be a step closer to the real world, featuring robots with Lidar sensors that perform continuous navigation tasks, avoiding obstacles and each other.

CPU (Central Processing Unit)

Processors typically used for running reinforcement learning environments, which traditionally create a bottleneck when interacting with GPUs used for policy training.

Internet of Environments

A proposed concept for training agentic AI models, involving a wide array of simulated environments for trial-and-error learning and decision-making.

Policy (RL)

In reinforcement learning, a policy is the instruction set for how an agent acts, mapping observations or trajectories to a distribution over actions.

Regret

In reinforcement learning, regret is the difference between the performance of an optimal policy and the performance of the agent's current policy on a given environment. It indicates the potential for learning.

Reward Function

A function used in reinforcement learning that defines what constitutes desirable or undesirable outcomes for an agent, guiding its learning process.

Supervised Learning

A type of learning where AI models are fed a data corpus to predict the future. This is the primary method used in generative AI.

Learnability

An intuitive notion of how well an AI agent can learn in a given environment, often characterized by tasks that are difficult but not impossible, where success is sometimes achieved but not always.

Grid World

A simplified, discrete 2D environment used in reinforcement learning research, often featuring an agent navigating from a start to a goal position. It is considered a basic testbed for RL algorithms.

Self-supervised Learning

A type of learning, similar to supervised learning, used in generative AI where models learn from a data corpus by predicting parts of the data.

Agentic Foundation Model

A foundation model pre-trained for decision-making and acting in the world, rather than solely for predicting the next token, signaling a step towards more capable AI.

Software & Apps

AlexNet

A deep convolutional neural network that was one of the early successes that demonstrated the power of deep learning on GPUs.

Kinetics

A recently released N2N GPU-accelerated system that includes an editor for generating tasks, a GPU-accelerated physics engine (Box2D), a UI for human play, and RL training code with curriculum methods.

Box2D

A 2D rigid body physics engine that is used within the 'Kinetics' system for simulating a wide variety of tasks.

DQN (Deep Q-Network)

A significant development in deep reinforcement learning from 2013, contributing to the over 40,000 papers in the field.

LLM (Large Language Model)

Models pre-trained on vast text data that can be fine-tuned for specific downstream tasks, similar to how agentic models can be fine-tuned after pre-training.

Organizations

Google Brain

An AI research group where early deep learning efforts were conducted using tens of thousands of CPUs, prior to the widespread adoption of GPUs.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free