The Real Surprise of the Last 3 Years in AI – Dario Amodei
Key Moments
AI progress is on track with exponential growth; RL scales like pretraining.
Key Insights
The underlying AI technology has continued its exponential growth largely as expected, with only minor variance in timing and trajectory.
The most surprising point is how little public recognition there is of how close we may be to the end of the current exponential growth phase.
The 'Big Blob of Compute' hypothesis from 2017 remains a useful framework: the main levers of progress are compute, data, data quality/distribution, training duration, scalable objectives, and robust normalization.
Scaling trends seen in pretraining now extend into reinforcement learning (RL), suggesting a similar log-linear relationship between resources and performance in RL tasks.
Public discussions often focus on political or social narratives, while technical progress—especially in RL scaling laws—continues largely under the radar.
THE BIG QUESTION: WHAT CHANGED IN THREE YEARS
Three years into the conversation about AI progress, the central question remains: what has truly changed? The speaker argues that the underlying technology has kept advancing along an exponential curve, roughly as expected, with only modest deviations in timing. The direction of specific domains, notably code and programming capabilities, is harder to predict and appears uneven across frontier areas. Beyond the scale of models themselves, the broad trend is that AI is moving from novice to more capable, with certain skills pushing into professional territory. However, there’s a striking surprise: the public perception hasn’t caught up with how close we might be to the end of this exponential era.
EXPECTATIONS VS REALITY IN EXPONENTIAL GROWTH
The discussion reinforces an expectation that the trajectory of AI progress should resemble a steady march driven by compute, data, and training time. The models have progressed from simple demonstrations to more sophisticated capabilities that resemble college-level work and, in some cases, PhD-level tasks. The frontier is not perfectly uniform—some areas advance faster than others—yet the overall arc aligns with prior intuitions. What stands out is the sense that public narratives often misjudge the pace, direction, or limits of this exponential growth.
WHY THE DIRECTION OF CODE REMAINS UNCLEAR
One of the notable admissions is the unpredictability of the specific direction of code and reasoning capabilities. While the broad pattern of growth is understandable, the exact path for competencies like coding, reasoning, and meta-learning is less certain. This reflects a broader lesson: even with powerful models, isolating which architectural or training changes drive particular kinds of skill improvement remains challenging. The growth is real, but the mechanisms behind certain gains are still not fully transparent.
THE END OF THE EXPONENTIAL: A SURPRISING PERCEPTION
A core surprise is the apparent lack of public recognition that we may be near the end of the current exponential phase. The speaker notes that insiders and outsiders alike continue to debate political questions while the practical, accelerate-and-scale nature of AI progress proceeds largely under the radar. There is a call to better understand what the current exponential looks like now, including how to interpret scaling signals in a world where public-facing scaling laws for RL have not yet emerged.
THE BIG BLOB OF COMPUTE: ORIGINS AND CONTINUITY
The conversation anchors itself in the 'Big Blob of Compute' hypothesis, first articulated in 2017. It argues that the cleverness of technique is secondary to a handful of fundamental levers: raw compute, data quantity, data quality and distribution, training duration, scalable objective functions, and stability through normalization. This framing helped explain why pretraining scaling laws emerged and why RL scales appear to follow similar principles. The hypothesis remains a useful lens for understanding where progress comes from and how to forecast future gains.
SEVEN KEY FACTORS THAT DRIVE SCALE
Within the Big Blob framework, seven factors matter most: how much raw compute you have; how much data you collect; the quality and distribution of that data; how long you train; the objective function’s scalability (pretraining vs RL); the nature of the reward signal and goals in RL; and the role of normalization and conditioning to maintain numerical stability. Each factor interacts with the others, and progress hinges on balancing them to enable smooth, robust learning at scale.
PRE-TRAINING VS RL: TWO SIDES OF THE SAME COIN
A key shift noted is that RL is now being studied as an extension to pretraining rather than as a separate, standalone stage. This mirrors the behavior seen in language model scaling, where pretraining gains continue and an RL phase can be layered on top. The implication is that the same scaling intuition—more compute and more data yielding better performance—applies to RL tasks, reinforcing the idea that the major drivers of progress transcend specific learning paradigms.
EVIDENCE OF RL SCALING: MATH CONTESTS AND BEYOND
Empirical examples show RL performance improving in a log-linear fashion with respect to training time and compute, similar to pretraining. Studies and company reports demonstrate that when you train models on tasks like math contests or other RL benchmarks, the rate of improvement tracks a predictable path as resources increase. This cross-domain consistency strengthens the argument that the same fundamental scaling laws apply across supervised and reinforcement learning tasks, even if the exact forms may differ by domain.
THE CHALLENGE OF SCALING LAWS IN RL
Unlike well-documented pretraining scaling laws, there is no publicly known, universal scaling law for RL. The story is more complex due to the role of rewards, environment dynamics, and sample efficiency, which can vary dramatically across tasks. The lack of a single RL scaling law does not undermine the broader Big Blob view but does complicate the task of predicting RL progress with the same precision as supervised learning.
PUBLIC DISCUSSION VS TECHNICAL PROGRESS: A CULTURAL GAP
A notable theme is the disconnect between public discourse and technical progress. While policy debates and political issues capture public attention, the rapid, resource-intensive advances in AI—particularly RL scaling and the interplay with pretraining—often proceed quietly. The speaker urges a more nuanced public conversation about the technical frontiers and a willingness to engage with the mechanics of scaling rather than only the social and ethical dimensions.
IMPLICATIONS FOR RESEARCH FOCUS AND INVESTMENT
If the Big Blob framework remains valid, research and investment should prioritize the levered factors: securing more compute and data, curating high-quality, well-distributed data, extending training durations where feasible, and refining scalable objective functions alongside robust normalization techniques. Understanding how RL objectives interact with pretraining, and identifying when to apply RL fine-tuning, could yield outsized gains. It also suggests a balanced approach that values both algorithmic innovation and data-centric improvements.
LOOKING AHEAD: CHALLENGES AND QUESTIONS FOR AI
Looking forward, the conversation points to a set of open questions: how exactly RL scaling laws will crystallize, how to quantify and compare progress across domains, and what signals best forecast near-term breakthroughs. The takeaway is that progress remains compute-driven, but our interpretive frameworks—and the public’s understanding—must evolve. The path ahead likely involves a tighter integration of pretraining and RL, clearer scaling narratives, and an emphasis on data stewardship alongside architectural exploration.
Mentioned in This Episode
●Tools & Products
●Books
●Studies Cited
●People Referenced
Common Questions
He argues that the underlying technology has grown exponentially as expected, with the overall trajectory broadly aligning with his expectations. He notes the most surprising part is how little public recognition there is that we may be near the end of the exponential.
Topics
Mentioned in this video
An internal doc/hypothesis discussed in the talk outlining seven key factors that drive AI scaling.
DeepMind's RL agent for StarCraft II cited as an RL scaling example.
DeepMind's reinforcement learning milestone mentioned as part of RL scaling history.
RL benchmark games (Dota) referenced in the RL scaling context.
American Invitational Mathematics Examination used as an RL-style task example.
More from Dwarkesh Clips
View all 13 summaries
4 minThe Library of Alexandria Isn’t Where We Lost Most Ancient Books - Ada Palmer
6 minWhy Renaissance Art Was Really About Power – Ada Palmer
4 minWhy Machiavelli dedicated The Prince to his torturers – Ada Palmer
4 minWhy Claude Needs a Constitution – Dario Amodei
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free