Chelsea Finn: Building Robots That Can Do Anything
Key Moments
Robots need general-purpose intelligence, not just scale, for real-world tasks. Diverse data and pre-training are key.
Key Insights
Developing general-purpose robots requires a foundation model approach, similar to language models, rather than task-specific solutions.
While scale is necessary, diverse and realistic data is more crucial than sheer volume for robots to generalize to real-world conditions.
A pre-training and curated post-training recipe is vital for complex robotic tasks like folding laundry, significantly improving performance.
Robots can generalize to unseen environments and tasks by training on diverse data, even if that data constitutes a small percentage of the total pre-training mix.
Open-ended prompts and interjections can be handled by robots through hierarchical models and synthetic data generation using language models.
Integrating world models and improving real-time inference infrastructure are key challenges for deploying robust robotic systems.
THE CHALLENGE OF SPECIALIZED ROBOTICS
The current paradigm for robotics applications often requires building an entire company for each specific task, from logistics to surgery. This involves developing custom hardware, software, movement primitives, and handling numerous edge cases. This highly specialized approach is difficult and has historically limited the success and widespread adoption of robots in daily life. Physical Intelligence aims to overcome this by developing a general-purpose model capable of enabling any robot to perform any task in any environment, mirroring the success of foundation models in language.
THE NECESSITY OF DIVERSE AND REALISTIC DATA
While scale is important for training generalizable models, simply scaling up data from industrial automation, YouTube, or simulations is insufficient. Industrial data lacks diversity for real-world applications like disaster response or grocery bagging. YouTube data, while abundant, doesn't provide the embodied learning needed. Simulated data often lacks realism. Therefore, the lesson learned is that scale is a necessary but subordinate factor; solving the problem requires diverse, realistic, and relevant data that captures the complexity of the physical world.
PRE-TRAINING AND POST-TRAINING FOR COMPLEX TASKS
For highly complex tasks such as folding laundry, a dual approach of pre-training on all available robot data followed by fine-tuning on a curated, high-quality set of demonstration data is crucial. This recipe, inspired by language model development, significantly improves robotic performance. Starting with simpler subtasks, like folding a single shirt, and gradually increasing complexity, combined with this pre-training and post-training strategy, unlocks capabilities that were previously unattainable with simpler methods.
GENERALIZATION TO UNSEEN ENVIRONMENTS AND TASKS
A significant advancement is enabling robots to succeed in environments they have never encountered. This is achieved by training on highly diverse datasets that include mobile manipulation data from various homes, kitchens, and bedrooms, even if this data represents a small fraction of the total pre-training mix. The key is that this diverse data, along with static manipulation and web data, allows the model to build a general understanding, leading to improved performance in novel situations. Preserving the capabilities of pre-trained vision-language models is also vital for effective language following.
RESPONDING TO OPEN-ENDED PROMPTS AND INTERJECTIONS
To allow robots to handle user-defined tasks and dynamic interactions, hierarchical vision-language-action models are employed. These models break down open-ended prompts into intermediate subtasks, executing them with a low-level policy. Synthetic data, generated by language models that re-label existing robot data with hypothetical human prompts, plays a crucial role in training the high-level policy. This approach enables robots to understand and respond to complex instructions, modifications, and real-time corrections, going beyond a fixed set of commands.
FUTURE CHALLENGES AND OPPORTUNITIES
Despite significant progress, challenges remain, including improving reliability, speed, and handling partial observability and long-term planning. Key opportunities lie in developing better robotic infrastructure, contributing to open-source models and datasets, and exploring synthetic data for evaluation and reinforcement learning. Research into integrating world models, ensuring safety, and scaling real-time inference are critical next steps for truly deployable, general-purpose robots in the open world.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
Comparison of Pre-training and Post-training Strategies for Robot Task Performance
Data extracted from this episode
| Strategy | Performance (Task Progress) | Evaluation Context |
|---|---|---|
| Pre-training and Post-training (Combined) | High Performance (Reliably flatten and fold objects) | Evaluated robot task performance |
| No Pre-training (Only Curated Data) | Minimal Progress (Only able to get item out of bin) | Evaluated robot task performance |
| No Post-training (All Data) | Minimal Progress (Only able to get item out of bin) | Evaluated robot task performance |
| Full Pre-training Mixture | Higher Performance (>20% increase) | Evaluated in novel homes |
| Excluding Static Robot Data | Significantly Reduced Performance (<60%) | Evaluated in novel homes |
| Increased Diversity of Homes | Performance Increases (Closes generalization gap) | Evaluated in novel homes |
Common Questions
The primary challenge is that solving a specific robotics application often requires building an entire company around it. This involves creating new hardware, custom software, and unique movement primitives for each task, which is difficult and has led to limited success for many robotics companies.
Topics
Mentioned in this video
More from Y Combinator
View all 109 summaries
54 minThe Future Of Brain-Computer Interfaces
38 minCommon Mistakes With Vibe Coded Websites
20 minThe Powerful Alternative To Fine-Tuning
24 minThe AI Agent Economy Is Here
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free