Key Moments

How Scaling Laws Will Determine AI's Future | YC Decoded

Y CombinatorY Combinator
Science & Technology4 min read11 min video
Jan 23, 2025|39,902 views|1,105|50
Save to Pod
TL;DR

AI models improve predictably with more data, parameters, and compute, but hitting limits. A new 'test time compute' paradigm lets them think longer, unlocking new scaling potential beyond just bigger models.

Key Insights

1

AI model performance doubles roughly every 6 months when scaling parameters, data, and compute, mirroring Moore's Law but at a faster rate.

2

OpenAI's 2020 'Scaling Laws for Neural Language Models' paper established that performance is more dependent on scale than algorithm.

3

Google DeepMind's Chinchilla research (2022) found previous LLMs were undertrained, demonstrating that optimal training requires a balance of model size and data, suggesting smaller models with more data can outperform larger ones.

4

Recent LLM models show diminishing returns, with capabilities plateauing despite increased size and cost, leading to a debate about hitting scaling limits.

5

OpenAI's new reasoning models, like 03, show significant performance leaps by scaling 'test time compute'—allowing models to think longer on complex problems—instead of just scaling pre-training size.

The era of scaling laws has arrived

The past few years have seen AI labs adopt a 'more is more' approach to scaling Large Language Models (LLMs). By increasing the number of parameters, data, and compute power, they have achieved predictable performance improvements. This trend is often compared to Moore's Law, but with AI performance potentially doubling every six months, a rate significantly faster than the 18 months seen in semiconductor advancements. This strategy, which began gaining traction with OpenAI's GPT-3 in summer 2020 (over 100 times larger than its predecessor, GPT-2), offered a clear path to more capable AI systems, moving beyond mere speculation about the benefits of increased scale.

Understanding the three ingredients of AI scaling

Training AI models can be understood as a recipe with three key ingredients: the model itself, the data it's trained on, and the compute power used. Model size is determined by parameters, which are the internal values adjusted during training to make predictions. Data is measured in tokens (often words or parts of words), and larger models are typically trained on vast datasets. Compute power refers to the GPUs and energy required to train these models. The pivotal 'Scaling Laws for Neural Language Models' paper by Jared Kaplan and colleagues at OpenAI in January 2020 revealed that by increasing all three of these factors—parameters, data, and compute—model performance improved smoothly and consistently, following a power-law relationship. This research fundamentally shifted the understanding of AI development, suggesting that scale itself was a more significant driver of performance than the specific algorithm used.

The data-equation: Chinchilla and optimal training

While OpenAI's initial research highlighted the importance of scaling parameters, data, and compute together, Google DeepMind's 2022 research added a crucial nuance: the optimal balance between model size and training data. Their extensive experiments, involving over 400 models of varying sizes and data amounts, revealed that many previous LLMs, including GPT-3, were likely undertrained. They trained a model named Chinchilla, which was less than half the size of GPT-3 but trained on four times more data. The results were striking: Chinchilla outperformed models twice or thrice its size. These 'Chinchilla scaling laws' emphasized that achieving the most effective AI training involved not just increasing model size, but critically, ensuring sufficient high-quality data was available to fully leverage that size. This insight has been instrumental in the development of current frontier models like GPT-4o and Claude 3.5 Sonnet.

Are we hitting the limits of current scaling?

Despite the consistent gains achieved through scaling laws, there is a growing debate within the AI community about whether these laws are reaching their limits. Recent observations suggest that as models have become larger and more expensive to train, their performance improvements have started to plateau. Reports from major AI labs hint at failed training runs and diminishing returns. A significant bottleneck identified is the potential scarcity of high-quality data needed to train future generations of models. While some believe this data shortage is unlikely, others point out that we may not be as far from exhausting available data as initially assumed, raising questions about the sustainability of the current scaling paradigm.

A new frontier: Scaling 'test time compute'

In light of potential plateaus in traditional scaling, a new paradigm is emerging, spearheaded by OpenAI's recent class of reasoning models, like 01 and its successor, 03. These models demonstrate that intelligence can be scaled not just by increasing model size during pre-training, but by significantly increasing 'test time compute.' This means allowing the model more time and computational resources to 'think' and reason through complex problems during inference. 03, for instance, has shattered existing benchmarks in areas like software engineering, mathematics, and PhD-level science by effectively leveraging this capability. Instead of merely making models bigger, the focus shifts to enhancing their reasoning processes. This approach of scaling test time compute, by enabling LLMs to think for longer on demand, may unlock entirely new levels of capability and potentially pave a path towards Artificial General Intelligence.

Scaling laws beyond LLMs

The principles of scaling laws are not confined to Large Language Models; they appear to be fundamental across various AI domains. Similar scaling dynamics are observed in image diffusion models, protein folding simulations, chemical modeling, and even in 'world models' used for robotics and self-driving cars. While the midgame for LLMs might be in full swing, the application of scaling laws to these other modalities is still in its early stages, suggesting significant future advancements are possible.

Common Questions

Scaling laws in AI refer to the observation that increasing the size of a model, the amount of data it's trained on, and the compute power used for training leads to consistent improvements in performance, often in a power-law relationship.

Topics

Mentioned in this video

More from Y Combinator

View all 562 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free