Why didn't simply scaling up AI models lead to AGI?

Scaling up models primarily improved memorized skills and performance on existing benchmarks. It did not equip AI with the fluid intelligence needed to generalize and adapt to novel situations.

What is Test-Time Adaptation (TTA) and why is it important for AGI?

TTA allows AI models to modify their behavior and learn during inference time. This is crucial for developing fluid intelligence, enabling AI to adapt to new data and problems it hasn't been explicitly trained on.

How does Francois Chollet define intelligence?

Chollet defines intelligence as the efficiency with which past information (experience) is operationalized to deal with novel and uncertain future situations. It's about the process of adapting and learning, not just exhibiting learned skills.

What are the ARC benchmarks and what is their purpose?

ARC (Abstraction Reasoning Corpus) benchmarks, including ARC 1, 2, and 3, are designed to measure fluid intelligence and reasoning in AI. They serve as a tool to direct research attention towards critical unsolved bottlenecks on the path to AGI.

What is the difference between Type 1 and Type 2 abstraction in AI?

Type 1 abstraction is value-centric, using continuous functions for perception and pattern cognition (e.g., modern ML). Type 2 abstraction is program-centric, using discrete structure matching for reasoning and planning (e.g., software refactoring).

How can AI move beyond automation towards invention?

By combining Type 1 (intuition, perception) and Type 2 (reasoning, planning) abstractions, and leveraging discrete program search guided by learned intuition, AI can move from automating known tasks to independent invention and discovery.

What is the goal of the India lab?

The India lab aims to build AI systems capable of independent invention and discovery to dramatically accelerate scientific progress, going beyond mere automation to expand the frontiers of knowledge.

Key Moments

François Chollet: How We Get To AGI

Y Combinator

Science & Technology4 min read35 min video

Jul 3, 2025|138,132 views|3,057|203

YC Y Combinator

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Current AI scaling is insufficient for AGI; focus shifts to test-time adaptation and combining abstraction types.

Key Insights

The cost of compute has been a primary driver of AI progress, but scaling current models alone is not enough for AGI.

There's a critical distinction between memorized skills and fluid general intelligence, which involves adapting to novel situations.

Test-time adaptation (TTA) represents a significant shift, enabling models to learn and adapt during inference, showing promise for fluid intelligence.

Intelligence is best defined as the efficiency of operationalizing past information to deal with future novelty and uncertainty, not just skill acquisition.

Human intelligence relies on a combination of Type 1 (value-centric) and Type 2 (program-centric) abstractions, a synergy AI needs to replicate.

Future AGI development requires moving beyond deep learning's strength in Type 1 abstraction towards discrete program search for Type 2 capabilities and invention.

THE LIMITATIONS OF THE SCALING PARADIGM

For years, the dominant paradigm in AI has been scaling up deep learning models, particularly large language models, driven by the falling cost of compute and the availability of vast datasets. This approach, often referred to as 'pre-training scaling,' showed predictable improvements on benchmarks as models and data increased. However, this progress primarily reflected an enhancement of memorized skills and static inference, rather than true fluid general intelligence. The core issue was mistaking benchmark performance for genuine understanding and adaptability, a limitation highlighted by benchmarks like the Abstraction and Reasoning Corpus (ARC).

WHAT IS TRUE INTELLIGENCE?

François Chollet posits that intelligence is not merely the ability to perform tasks but rather the efficiency with which one operationalizes past information to navigate novelty and uncertainty. This contrasts with the traditional view of AI as achieving human-level task performance, often framed by corporate goals of automating economically valuable tasks. Chollet emphasizes that intelligence is a process of dealing with new situations and building new capabilities, akin to a road-building company rather than just a static road network. This definition moves beyond crystallized behavior and skills, focusing on the dynamic capacity to adapt and invent.

THE SHIFT TO TEST-TIME ADAPTATION

The AI research community has seen a significant pivot towards 'test-time adaptation' (TTA). This paradigm shift focuses on creating models capable of changing their own state and behavior dynamically during inference. Unlike pre-training, which loads knowledge statically, TTA involves learning and adapting on the fly. Techniques like test-time training and program synthesis fall under this umbrella, enabling AI systems to modify their responses based on specific encountered data. This approach has demonstrated significant progress on benchmarks like ARC, indicating a move towards more fluid and adaptive intelligence.

REDEFINING AND MEASURING INTELLIGENCE: THE ARC BENCHMARK

To address the limitations of existing benchmarks, the Abstraction and Reasoning Corpus (ARC) was developed. Unlike traditional tests that can be 'gamed' through memorization, ARC tasks are unique and require on-the-fly problem-solving using core, implicit knowledge that even young children possess. ARC aims to measure fluid intelligence by presenting novel problems that cannot be solved by simply recalling stored patterns. While ARC1 initially served to highlight the inadequacy of scaling, ARC2 and the upcoming ARC3 are designed to be more sensitive, probing compositional generalization and even agency, providing a more nuanced measure of AI's progress towards AGI.

THE DUAL NATURE OF ABSTRACTION IN INTELLIGENCE

Human intelligence is characterized by the interplay of two types of abstraction: Type 1 (value-centric) and Type 2 (program-centric). Type 1, driven by continuous functions and comparisons, underlies perception, intuition, and pattern recognition—areas where modern machine learning excels. Type 2, involving discrete program comparison and structural matching, is crucial for human reasoning, planning, and invention. While current AI, particularly transformers, are adept at Type 1 abstraction, they struggle with Type 2 tasks like sorting or arithmetic, indicating a critical gap in achieving general intelligence.

THE FUTURE: COMBINING ABSTRACTIONS AND SEARCH FOR INVENTION

Achieving AGI requires moving beyond current AI capabilities by effectively combining Type 1 and Type 2 abstractions. This involves leveraging discrete program search, guided by deep learning-driven intuition, to overcome the combinatorial explosion inherent in Type 2 reasoning. The goal is to create 'programmer-like' meta-learners that can synthesize novel programs by combining deep learning modules for perception (Type 1) and algorithmic modules for reasoning (Type 2). This approach, emphasizing reusability through a shared library of abstractions and efficient search, is the focus of new research labs aiming to build AI capable of independent invention and accelerating scientific discovery.

Mentioned in This Episode

●Software & Apps

●Tools

●Companies

●Concepts

●People Referenced

Common Questions

Static skills refer to memorized, task-specific abilities, while fluid intelligence is the ability to understand and adapt to entirely new problems on the fly, without prior preparation.

Topics

Abstraction Reasoning Corpus Test-Time Adaptation Fluid Intelligence Type 1 Abstraction Type 2 Abstraction Discrete Program Search

Mentioned in this video

Concepts

Test Time Adaptation (TTA)

A new AI paradigm where models adapt their state at test time, showing progress on fluid intelligence tasks.

Kaleidoscope Hypothesis

The idea that the universe, despite appearing novel, is composed of a small set of reusable 'atoms of meaning' or abstractions.

Type 2 Abstraction

Program-centric abstraction involving discrete program comparison and structure matching, underlying reasoning, planning, and software engineering.

Type 1 Abstraction

Value-centric abstraction using continuous distance functions, underlying perception, pattern cognition, and modern ML.

gradient descent

Software & Apps

03 model

A model that shows progress on ARC 2 with test-time adaptation, but still falls short of human-level performance.

ARC 3

A future benchmark focusing on agency, interactive learning, goal setting, and efficiency in novel environments.

Abstraction Reasoning Corpus (ARC)

An AI benchmark designed to test fluid intelligence and reasoning, rather than memorization.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free