Why did AI models see a jump in capability between 2025 and 2026?

The jump is primarily due to a shift from pre-training to post-training LLMs, focusing on specific tasks like programming, and the development of sophisticated coding harnesses that augment the LLMs' capabilities.

Are the time durations on the Meter chart precise measures of AI capability?

No, the durations are abstract measures of task difficulty, not precise indicators of how much human professional work an AI can do, as they reflect the time a 'low context' individual might take.

What is the difference between the 'rising water' and 'river tributaries' models for AI progress?

The 'rising water' model incorrectly suggests general AI capability increases uniformly, while the 'river tributaries' model views progress as exploring specific applications, each requiring custom tools and facing unique challenges.

Do the recent advancements in AI programming tools mean AI will 'eat everything'?

No, the impressive gains in AI for programming tools are specific to that domain and do not indicate a general intelligence explosion or that AI will universally take over all tasks.

Why are communities like transhumanists influential in AI discussions?

Transhumanists and existential risk proponents often focus on exponential growth and potential future impacts, influencing how AI companies communicate about their technology and creating significant public anxiety.

What is Cal Newport's call to action for AI companies?

He urges AI companies to distance themselves from cult-like communities such as transhumanists and existential risk proponents, and to focus on clearly explaining the real-world utility of their tools without exaggerated claims.

Key Moments

Is AI About to “Eat Everything”? (It’s Not.)

Deep Questions with Cal Newport

People & Blogs6 min read32 min video

May 14, 2026|15,477 views|596|189

Cal Newport Deep Work Deep Life Deep Questions TimblockPlanner Deep Questions Podcast cal newport interview cal newport podcast social media detox productivity tips cal newport productivity cal newport motivation

Save to Pod

Key Moments

TL;DR

AI progress in programming tasks is accelerating due to better models and sophisticated 'coding harnesses,' but this doesn't signal an impending AI takeover, as it's a specific application, not general intelligence growth.

Key Insights

The METR chart tracks the longest duration software tasks AI models, when combined with coding harnesses, can complete with at least 50% success, not general AI capability.

AI model improvement shifted from pre-training scaling in 2024 to post-training and tuning for specific tasks like programming, leading to recent performance gains.

The recent exponential-like increase on the METR chart is significantly driven by the development of complex, hand-coded 'coding harnesses' and expert systems, not just LLM advancements.

The METR chart's task durations are abstract measures of difficulty for 'low context' programmers, not precise indicators of what high-context professionals can achieve.

Progress in AI applications is better modeled as exploring navigable 'tributaries' (specific applications) rather than a general 'water level' rise, meaning progress in one area doesn't predict progress in others.

Transhumanist and existential risk communities, driven by extrapolating exponentials, have unduly influenced the discourse around AI, leading to exaggerated fears of an AI 'eating everything' scenario.

Understanding the METR time horizon chart

Recent online discourse, amplified by figures like Gary Marcus, has seized upon the METR (AI Safety and Evaluation Organization) time horizon chart, interpreting its upward trend as evidence of an imminent "intelligence explosion" and AI's tendency to "eat everything." The chart, which shows data points rising sharply from 2025 onwards, has fueled sensationalist tweets claiming AI power is doubling rapidly and that human input will soon become a liability. These interpretations often compare the METR chart to graphs predicting the rise of artificial superintelligence (ASI), creating a sense of urgency and unease. This summary aims to critically examine what the METR chart actually measures and what its trends signify, debunking the more extreme claims.

What the METR chart actually measures

Cal Newport clarifies that the METR chart does not measure the general capability of AI models. Instead, it focuses on a specific suite of well-defined software tasks that can be solved by writing or analyzing computer code. For each task, human programmers were timed, and the geometric mean of their completion times was recorded to label the task's 'human duration.' Subsequently, large language models (LLMs) combined with 'coding harnesses' (programs that help the LLM solve challenges, similar to Claude Code or Cursor) are evaluated. The chart plots each model against the *longest duration task* it could complete successfully at least 50% of the time, correlating this with the model's release date. For instance, a model plotting at '12 hours' means it can complete a specific coding task that took humans, on average, 12 hours to finish, at least half the time. This is a specific benchmark for programming tasks, not a universal measure of AI's potential.

The limitations of the measured durations

Crucially, the specific numerical durations represented on the chart are not precise indicators of AI capability relative to human professionals. METR itself acknowledges the difficulty in assigning precise meaning to these times. The 'human time duration' could include significant overhead for understanding the task, learning new techniques, or researching unfamiliar concepts. METR specifies that their time horizon is closer to what a 'low context person' (like a new hire or remote contractor) can accomplish, rather than what a high-context professional can do in their daily job. Therefore, these durations should be viewed as abstract measures of programming task difficulty, indicating that a model can tackle a task of a certain complexity, rather than signifying it can perform X hours of work a human could.

The shift from pre-training to post-training and harnesses

The dramatic upturn on the METR chart, particularly from late 2024 onwards, is explained by a fundamental shift in AI development strategy. For years, the focus was on pre-training LLMs – long, expensive processes involving massive datasets to imbue models with general knowledge. This approach, while improving general capabilities (e.g., GPT-2 to GPT-4), hit a wall around the summer of 2024, where simply scaling up pre-training yielded diminishing returns in obvious new capabilities. This led to a pivot towards post-training: taking pre-trained models and fine-tuning them on very narrow, high-quality datasets using techniques like reinforcement learning. Computer programming emerged as a prime target for this post-training due to its structured nature. This fine-tuning improved the LLMs' ability to generate longer, more coherent, and correct code. Concurrently, significant effort was invested in developing sophisticated 'coding harnesses'—programs that integrate LLMs with tools for planning, execution, and verification, mirroring professional developer workflows. These harnesses often incorporate substantial amounts of hand-coded logic and 'expert systems,' drawing on decades of programming expertise.

The role of coding harnesses in recent gains

The exponential-like leap observed in the METR chart is not solely due to LLM improvements but is heavily influenced by the advancement of these coding harnesses, especially from late 2025 and early 2026. These harnesses act as sophisticated scaffolding, enabling LLMs to tackle multi-step, complex programming tasks that require planning, debugging, and interaction with development environments. The leakage of Anthropic's Claude Code's source code revealed the extensive human effort and traditional AI techniques embedded within these harnesses. This combination of fine-tuned LLMs capable of better planning and code generation, coupled with these robust, hand-coded harnesses, has created a powerful synergistic effect. This breakthrough represents a significant commercial success, demonstrating that specific, economically viable applications like professional-grade programming tools can be built upon AI technology.

The 'tributary' model versus 'rising water'

To counter the 'AI eats everything' narrative, Newport proposes a better mental model for AI progress: that of a river with navigable 'tributaries.' Instead of a general 'water level' rising to solve all problems (the 'rising water' model), AI progress is about identifying and exploring specific application areas (tributaries). Progress in one tributary, like software development where significant effort has been invested in custom tools and harnesses, does not automatically imply similar navigable pathways exist in unrelated areas (e.g., email management, which may prove to be much shallower or filled with rapids). This 'tributary' model highlights that the development of useful AI applications is a hard exploration process, requiring custom tools and significant effort, and success in one area is specific rather than generalizable.

The influence of transhumanism and existential risk communities

The exaggerated fears surrounding AI are also attributed to the influence of the transhumanist and existential risk (x-risk) communities. These groups, often intersecting with rationalists, tend to see the world through the lens of exponentials and their potential for radical societal transformation – either utopian or dystopian. They are drawn to the perceived exponential growth in AI capabilities, extrapolating current trends to predict inevitable AGI or ASI and significant societal upheaval. This worldview, rooted in eschatological thinking, shapes their interpretation of data like the METR chart as evidence of impending doom or salvation. This influential, albeit extreme, perspective has seeped into the discourse surrounding AI, contributing to widespread anxiety and the sensationalist narrative of AI 'eating everything'.

A call for a more grounded approach to AI

Newport argues that AI companies need to distance themselves from these cult-like communities and their extreme rhetoric. Instead of framing AI progress in terms of existential threats or utopian transcendence, companies should focus on clearly communicating the practical benefits and limitations of their tools. Just as the advent of electric cars was met with clear-eyed assessment of their utility, AI tools, including advanced programming assistants, should be discussed pragmatically. The METR chart, while impressive in its demonstration of progress in software development tools, says 'nothing about the fate of humanity or AI more generally.' The call is to treat AI as a technology, celebrating its useful applications without falling into the trap of wild extrapolation or succumbing to the anxieties fueled by fringe ideologies.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Books

●Concepts

●People Referenced

Common Questions

The Meter chart measures the duration of software tasks that Large Language Models (LLMs) combined with coding harnesses can complete successfully at least 50% of the time, using human task completion time as a benchmark.

Topics

AI & Machine Learning Technology & Innovation AI Evaluation Programming Tools AI Capabilities LLM Development AI Discourse

Mentioned in this video

Software & Apps

METER

An organization that released an AI Safety and Evaluation update, including a famous AI time horizon chart.

GPT-4

A large language model that represents advancements in AI capabilities, used as a reference point in the Epoch Capabilities Index.

GPT-2

An early large language model in the progression of models, existing before the significant jumps in capability seen in later models.

Claude Sonnet 3.5

An AI model that marked the first point where tasks could be completed on the Meter chart, indicating a move beyond pure pre-training.

Concepts

Epoch Capabilities Index

An index that measures AI model capabilities across various domains, showing a more linear progression compared to the exponential gains seen in programming tasks.

People

Elon Musk

An influential figure in tech and AI, mentioned in the context of individuals who need to distance AI company narratives from cult-like communities.

Gary Marcus

An individual who rounded up concerned responses to Meter's latest AI time horizon chart in his newsletter.

Henry Hudson

Mentioned as an analogy for exploring new territories, like navigating river tributaries, when discussing AI application development.

Dario Amodei

A key figure in the AI industry who, along with others like Sam Altman and Elon Musk, is urged to distance themselves from transhumanist and existential risk communities.

Sam Altman

A prominent figure in the AI industry, called upon to separate AI company messaging from extreme communities like transhumanists and existential risk proponents.

Ray Kurzweil

His work on exponentials influenced the transhumanist movement, leading to ideas about exponential increases in computing power and uploading consciousness.

Books

On Intelligence

A book by Max Tegmark that presented a model of AI capability growth as rising water levels, which the speaker argues is a flawed analogy for current generative AI.

Companies

OpenAI

A company that discovered the limitations of scaling up pre-training and shifted focus to post-training for AI model improvement.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free