Key Moments

Robots Are Finally Starting to Work

Y CombinatorY Combinator
Science & Technology6 min read50 min video
Apr 16, 2026|419 views|30|5
Save to Pod
TL;DR

Robots are now performing complex tasks like folding laundry and packaging orders with near-human capabilities, but each successful deployment still requires significant human oversight and is costly to scale.

Key Insights

1

Physical Intelligence's cross-embodiment approach, training across multiple robot platforms, resulted in a 50% improvement in performance compared to single-embodiment specialists.

2

Tasks that previously required hundreds of hours of data collection for robots can now be performed zero-shot, meaning without prior specific training for that task.

3

Even with advanced AI, a 'mixed autonomy system' with human oversight is currently necessary for robot deployment, with the goal of gradually increasing autonomy.

4

Cloud-hosted models are enabling real-time robotic control, overcoming the limitations of on-robot compute power and the rapid obsolescence of hardware.

5

The upfront cost of building a robotics company has significantly decreased, with the focus shifting from proprietary autonomy stacks to identifying use cases and collecting relevant data.

6

Physical Intelligence has open-sourced its foundation models (PI0 and PI05) to accelerate progress and foster a 'Cambrian explosion' of robotics startups.

Robots are achieving near-human capabilities, performing complex tasks with surprising efficiency.

The field of robotics is on the cusp of a significant breakthrough, moving beyond specialized industrial applications to tackle complex, everyday tasks. Companies like Physical Intelligence (PI) are developing foundation models that can control any robot to perform physically capable tasks with high performance. This progress is exemplified by demonstrations of robots folding diverse laundry items in real-world laundromats and packaging consumer goods in e-commerce warehouses. These are not simple tasks; laundry items are deformable and infinitely varied, and packaging requires precise manipulation, including nudging items into pouches through narrow openings. These achievements were previously considered the 'Turing test' for robotics, requiring deterministic programming approaches that struggled with the infinite variability of the real world. The possibility of robots performing such tasks was, for many, only conceivable after the advancements seen with models like ChatGPT.

Language models are unlocking common sense and planning for robots.

A key enabler for the recent progress in robotics is the integration of large language models (LLMs) and vision-language models. Semantics, one of the three pillars considered crucial for robotics (alongside planning and real-time control), has seen significant unlocks through LLMs. These models bring common sense knowledge into robotics, drastically reducing the need for task-specific data collection. For instance, an LLM can break down a task like 'record a podcast' into actionable steps and plans. Models like RT2 (Robotic Transformer 2) and PaLM-E have demonstrated the ability to transfer knowledge from vision-language models to low-level robotic actions. This allows robots to perform tasks involving concepts not explicitly present in their training data, such as identifying and interacting with specific celebrities in images or performing spatial reasoning with unseen objects.

Cross-embodiment training yields a 50% performance boost over specialists.

A significant challenge in robotics has been scaling data collection and ensuring models generalize across different hardware. Physical Intelligence's 'open cross-embodiment' approach trains models on data from multiple robot platforms simultaneously. This strategy leads to models that learn a more abstract understanding of robot control, rather than optimizing for a single, specific robot. Surprisingly, research from this approach, specifically the 'Robotic Transformer X' paper, showed that a generalist model trained across 10 different robot platforms performed 50% better than specialist models optimized for individual platforms. This is a paradigm shift, as traditionally, every new robot platform would add years to development due to the extensive effort required to get it operational and collect data. The implication is that by leveraging data from a diverse fleet, models become more robust and adaptable to variations and changes in hardware over time.

Cloud-based AI powers robots, overcoming hardware limitations.

Contrary to the expectation that robots need substantial on-board computing power, PI's models, even for complex tasks, are largely hosted in the cloud. This approach circumvents the high cost, rapid obsolescence, and power demands of on-robot compute hardware. Robots query API endpoints in the cloud, sending sensor data and receiving action commands in real-time. PI has developed algorithmic improvements, such as 'real-time chunking' and pre-computation within the control loop, to manage the inherent latency of cloud inference. This allows for smooth, continuous action with as little as 50 milliseconds of action worth remaining before querying the next set of actions. This strategy simplifies robot hardware significantly, potentially enabling even basic 'dumb' computers and cameras to stream data for sophisticated AI control, a crucial factor for mass deployment.

A new playbook emerges for vertical robotics startups.

The landscape for starting robotics companies has been dramatically reshaped. Historically, robotics required immense vertical integration, encompassing customer relationships, hardware development, autonomy stacks, and safety certifications, creating a high barrier to entry. PI aims to lower this barrier by providing a foundational intelligence layer that the community can build upon. The new playbook involves understanding existing workflows, meticulously identifying opportunities where robots can provide maximum impact, and being scrappy with hardware and data collection. The reactive nature of modern AI models can compensate for inaccuracies in less expensive robot movements. The key is to achieve 'break-even' economically through mixed autonomy systems (where humans provide oversight and corrections) before scaling robot deployment, a historical challenge due to long payback periods. This unbundling allows startups to focus on differentiation rather than reinventing the entire robotics stack.

The promise of a 'Cambrian explosion' in robotics.

Physical Intelligence's mission is to enable a widespread proliferation of robotics companies, akin to a 'Cambrian explosion.' By providing open-source foundation models (PI0 and PI05) and publishing their research, they aim to accelerate progress across the industry. The belief is that robots can bring abundance to the physical world, mirroring the impact of digital computing. The success of PI is not solely defined by its own models on its robots, but by the broader adoption and application of its technology by other companies and researchers. This vision addresses the argument that the fundamental problem of general-purpose robotics might take decades longer to solve, emphasizing community enablement and collaborative progress. The hope is to inspire a new wave of startups that can leverage these advancements to solve a myriad of real-world problems, especially in sectors with labor shortages.

The human element: a diverse team tackling a monumental challenge.

PI is an unconventional company with a larger than average founding team, many of whom previously worked together at Google's robotics team. This established synergy and shared experience, coupled with a passion for the mission, proved crucial for tackling the immense complexity of general-purpose robotics. Key members include Adnan, the hardware lead tasked with managing a fleet of heterogeneous robots, and Locky, who ensures the business aspects are sound. The team emphasizes a 'divide and conquer' strategy, leveraging individual strengths to accelerate progress. For many co-founders, this is their first startup experience, revealing a surprising lack of existing infrastructure for large-scale, general-purpose robotics. This has necessitated PI building much of its own software for data collection, management, annotation, and evaluation, highlighting an opportunity for future support services businesses in the robotics ecosystem.

Generalist vs. Specialist Robot Policy Performance

Data extracted from this episode

Model TypePerformance ImprovementNotes
Generalist (trained across 10 platforms)50% betterOutperformed specialists optimized for a single platform.

Common Questions

The 'GPT-1 moment' for robotics refers to a breakthrough similar to the impact of GPT-1 on language models, leading to a general-purpose AI model that can control any robot to perform any physically capable task with high performance.

Topics

Mentioned in this video

More from Y Combinator

View all 580 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free