How have recent advancements in language models (like Seikhan, PaLM-E, RT-2) changed robotics?

These advancements have enabled robots to leverage common sense knowledge from language models for planning and semantics, and vision-language models for low-level control, significantly reducing the need for robot-specific data and enabling more complex tasks.

What is the main challenge in scaling robotics data collection?

The main challenge is data scarcity, as there's no 'internet of robotic data.' It requires operationally heavy efforts to collect and capture data from diverse robots, with insufficient incentives to capture existing data.

How does Physical Intelligence approach robotics development differently regarding compute?

Instead of putting heavy compute on the robot, Physical Intelligence hosts large models in the cloud and uses a real-time control loop where the robot queries an API. Algorithmic improvements like 'real-time chunking' manage the delay to ensure smooth operation.

What are the key steps to starting a vertical robotics company today?

The playbook involves understanding existing workflows, identifying key opportunities for robotic insertion, being scrappy with hardware and data collection, using cheaper hardware with reactive models, employing mixed autonomy for break-even, and then scaling the number of robots.

What skills are important for aspiring robotics entrepreneurs?

Aspiring entrepreneurs need a deep understanding of existing workflows, meticulous identification of robotic opportunities, scrappiness in hardware and data collection, and the ability to integrate various systems. While not strictly required, understanding mechanical engineering and autonomy stacks is beneficial.

Why is Physical Intelligence open-sourcing its research and models?

Physical Intelligence open-sources its research and models like PI Zero and PIO5 to enable the community, accelerate progress, and foster a 'Cambrian explosion' of robotic companies and use cases by lowering the barrier to entry.

Key Moments

Robots Are Finally Starting to Work

Y Combinator

Science & Technology6 min read50 min video

Apr 16, 2026|419 views|30|5

YC Y Combinator

Save to Pod

Key Moments

On this page

TL;DR

Robots are now performing complex tasks like folding laundry and packaging orders with near-human capabilities, but each successful deployment still requires significant human oversight and is costly to scale.

Key Insights

Physical Intelligence's cross-embodiment approach, training across multiple robot platforms, resulted in a 50% improvement in performance compared to single-embodiment specialists.

Tasks that previously required hundreds of hours of data collection for robots can now be performed zero-shot, meaning without prior specific training for that task.

Even with advanced AI, a 'mixed autonomy system' with human oversight is currently necessary for robot deployment, with the goal of gradually increasing autonomy.

Cloud-hosted models are enabling real-time robotic control, overcoming the limitations of on-robot compute power and the rapid obsolescence of hardware.

The upfront cost of building a robotics company has significantly decreased, with the focus shifting from proprietary autonomy stacks to identifying use cases and collecting relevant data.

Physical Intelligence has open-sourced its foundation models (PI0 and PI05) to accelerate progress and foster a 'Cambrian explosion' of robotics startups.

Robots are achieving near-human capabilities, performing complex tasks with surprising efficiency.

The field of robotics is on the cusp of a significant breakthrough, moving beyond specialized industrial applications to tackle complex, everyday tasks. Companies like Physical Intelligence (PI) are developing foundation models that can control any robot to perform physically capable tasks with high performance. This progress is exemplified by demonstrations of robots folding diverse laundry items in real-world laundromats and packaging consumer goods in e-commerce warehouses. These are not simple tasks; laundry items are deformable and infinitely varied, and packaging requires precise manipulation, including nudging items into pouches through narrow openings. These achievements were previously considered the 'Turing test' for robotics, requiring deterministic programming approaches that struggled with the infinite variability of the real world. The possibility of robots performing such tasks was, for many, only conceivable after the advancements seen with models like ChatGPT.

Language models are unlocking common sense and planning for robots.

A key enabler for the recent progress in robotics is the integration of large language models (LLMs) and vision-language models. Semantics, one of the three pillars considered crucial for robotics (alongside planning and real-time control), has seen significant unlocks through LLMs. These models bring common sense knowledge into robotics, drastically reducing the need for task-specific data collection. For instance, an LLM can break down a task like 'record a podcast' into actionable steps and plans. Models like RT2 (Robotic Transformer 2) and PaLM-E have demonstrated the ability to transfer knowledge from vision-language models to low-level robotic actions. This allows robots to perform tasks involving concepts not explicitly present in their training data, such as identifying and interacting with specific celebrities in images or performing spatial reasoning with unseen objects.

Cross-embodiment training yields a 50% performance boost over specialists.

A significant challenge in robotics has been scaling data collection and ensuring models generalize across different hardware. Physical Intelligence's 'open cross-embodiment' approach trains models on data from multiple robot platforms simultaneously. This strategy leads to models that learn a more abstract understanding of robot control, rather than optimizing for a single, specific robot. Surprisingly, research from this approach, specifically the 'Robotic Transformer X' paper, showed that a generalist model trained across 10 different robot platforms performed 50% better than specialist models optimized for individual platforms. This is a paradigm shift, as traditionally, every new robot platform would add years to development due to the extensive effort required to get it operational and collect data. The implication is that by leveraging data from a diverse fleet, models become more robust and adaptable to variations and changes in hardware over time.

Cloud-based AI powers robots, overcoming hardware limitations.

Contrary to the expectation that robots need substantial on-board computing power, PI's models, even for complex tasks, are largely hosted in the cloud. This approach circumvents the high cost, rapid obsolescence, and power demands of on-robot compute hardware. Robots query API endpoints in the cloud, sending sensor data and receiving action commands in real-time. PI has developed algorithmic improvements, such as 'real-time chunking' and pre-computation within the control loop, to manage the inherent latency of cloud inference. This allows for smooth, continuous action with as little as 50 milliseconds of action worth remaining before querying the next set of actions. This strategy simplifies robot hardware significantly, potentially enabling even basic 'dumb' computers and cameras to stream data for sophisticated AI control, a crucial factor for mass deployment.

A new playbook emerges for vertical robotics startups.

The landscape for starting robotics companies has been dramatically reshaped. Historically, robotics required immense vertical integration, encompassing customer relationships, hardware development, autonomy stacks, and safety certifications, creating a high barrier to entry. PI aims to lower this barrier by providing a foundational intelligence layer that the community can build upon. The new playbook involves understanding existing workflows, meticulously identifying opportunities where robots can provide maximum impact, and being scrappy with hardware and data collection. The reactive nature of modern AI models can compensate for inaccuracies in less expensive robot movements. The key is to achieve 'break-even' economically through mixed autonomy systems (where humans provide oversight and corrections) before scaling robot deployment, a historical challenge due to long payback periods. This unbundling allows startups to focus on differentiation rather than reinventing the entire robotics stack.

The promise of a 'Cambrian explosion' in robotics.

Physical Intelligence's mission is to enable a widespread proliferation of robotics companies, akin to a 'Cambrian explosion.' By providing open-source foundation models (PI0 and PI05) and publishing their research, they aim to accelerate progress across the industry. The belief is that robots can bring abundance to the physical world, mirroring the impact of digital computing. The success of PI is not solely defined by its own models on its robots, but by the broader adoption and application of its technology by other companies and researchers. This vision addresses the argument that the fundamental problem of general-purpose robotics might take decades longer to solve, emphasizing community enablement and collaborative progress. The hope is to inspire a new wave of startups that can leverage these advancements to solve a myriad of real-world problems, especially in sectors with labor shortages.

The human element: a diverse team tackling a monumental challenge.

PI is an unconventional company with a larger than average founding team, many of whom previously worked together at Google's robotics team. This established synergy and shared experience, coupled with a passion for the mission, proved crucial for tackling the immense complexity of general-purpose robotics. Key members include Adnan, the hardware lead tasked with managing a fleet of heterogeneous robots, and Locky, who ensures the business aspects are sound. The team emphasizes a 'divide and conquer' strategy, leveraging individual strengths to accelerate progress. For many co-founders, this is their first startup experience, revealing a surprising lack of existing infrastructure for large-scale, general-purpose robotics. This has necessitated PI building much of its own software for data collection, management, annotation, and evaluation, highlighting an opportunity for future support services businesses in the robotics ecosystem.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Studies Cited

●Concepts

●People Referenced

Generalist vs. Specialist Robot Policy Performance

Data extracted from this episode

Model Type	Performance Improvement	Notes
Generalist (trained across 10 platforms)	50% better	Outperformed specialists optimized for a single platform.

Common Questions

The 'GPT-1 moment' for robotics refers to a breakthrough similar to the impact of GPT-1 on language models, leading to a general-purpose AI model that can control any robot to perform any physically capable task with high performance.

Topics

AI & Machine Learning Technology & Innovation Business & Entrepreneurship Large Language Models Data Collection Startup Strategy Autonomous Systems Machine Learning Robot Control

Mentioned in this video

Organizations

Physical Intelligence

A robotics AI lab aiming to create a 'GPT-1 moment' for robotics, focused on building a model that can control any robot for any task at a high level of performance.

Concepts

GPT-1 moment

A transformative moment for robotics akin to the GPT-1 moment in language models, signifying a leap in general-purpose robotic capabilities.

Open-X embodiment

The concept of training models across multiple robot hardware platforms, showing that generalist models perform better than specialists. This enabled scaling laws in robotics.

US GDP

Used to illustrate the potential economic impact of solving advanced robotics, suggesting a 10% contribution to US GDP would be a massive number, justifying investment in data collection.

Markdown

A lightweight markup language mentioned as a potential component for building agent-based systems and simplifying task orchestration in research and development.

Software & Apps

ChatGPT

A large language model that serves as an analogy for the potential disruption in robotics, representing a significant advancement in AI.

PaLM

A vision-language model adapted for robotics that shows significant transfer of knowledge from language and vision to low-level robotic actions.

RT-2

Robotic Transformer 2, a model that leverages powerful vision-language models and robotic data to generate low-level actions, enabling tasks like picking specific objects (e.g., a coke can near a picture of Taylor Swift) even without prior exposure to such concepts in robot data.

Robotic Transformer X

A significant paper demonstrating scaling laws in robotics by training models across multiple hardware platforms, leading to a generalist model that outperformed specialists.

ImageNet

A benchmark dataset that significantly impacted the vision community, used as a comparison to the Open-X dataset in terms of scale and impact.

Weave

A YC company partnered with Physical Intelligence to demonstrate robots folding diverse laundry items in a real laundromat, showcasing domestic applications.

Obsidian

Mentioned alongside Markdown files as tools that could be used in conjunction with agents for orchestrating tasks and research, particularly in an open-source context.

Studies & Research

Seikhan

An early paper that demonstrated the integration of language models into robotics, enabling robots to leverage common sense knowledge and reducing the need for robot-specific data.

People

Taylor Swift

Mentioned as an example of a concept that a robot trained with RT-2 could understand and interact with, demonstrating advanced recognition and manipulation capabilities.

Alec Radford

Mentioned in the context of identifying neuron-based inputs and outputs that led to scaling laws, akin to the development of large language models.

Dario Amodei

Author of the essay 'All Watched Over by Machines of Loving Grace', which is referenced in the context of the ideal manifestation of technology automating abundance.

Companies

Ultra

A YC company focused on logistics, partnered with Physical Intelligence to develop a robot capable of picking items and placing them into pouches for shipping, demonstrating industrial applications.

Apple

Mentioned as the origin of some of the 'cracked' people on the Weave team, and also in the context of personal computing evolution.

Amazon

Mentioned as the source of soft pouches used in the logistics task demonstrated by Ultra, highlighting the need for robots in e-commerce fulfillment.

Whimo

Mentioned as an example of robots that used to require a server on board, highlighting the shift towards cloud-based compute for modern robotics.

OpenClaw

A concept or system related to agents and orchestrating multiple instances, discussed in the context of building research tools and automating processes.

Google

The company where several co-founders of Physical Intelligence previously worked in the robotics team, which provided a foundational environment for their collaboration and advancements.

Andro

Mentioned as the previous employer of Adnan, the hardware lead at Physical Intelligence.