OpenAI Backtracks, Gunning for Superintelligence: Altman Brings His AGI Timeline Closer - '25 to '29

AI ExplainedAI Explained
Science & Technology4 min read24 min video
Jan 8, 2025|108,973 views|3,524|634
Save to Pod

Key Moments

TL;DR

OpenAI shifts AGI timeline to 2025-2029 & backs away from superintelligence claims.

Key Insights

1

OpenAI CEO Sam Altman has accelerated his timeline for achieving Artificial General Intelligence (AGI), now predicting it within the current US presidential term (2025-2029).

2

OpenAI has backtracked on its pursuit of 'superintelligence,' a stance that contradicts previous statements and their stated mission.

3

Current LLMs, despite strong benchmark performance, still struggle with reliably completing complex, multi-step real-world tasks autonomously (currently around 24% success rate).

4

The development of AGI was initially pitched by OpenAI as a non-profit mission focused on safety and benefiting humanity, but recent changes suggest profit motives and control shifts.

5

A new benchmark indicates LLMs lack common sense and social skills, leading to errors in practical applications, necessitating continuous development of more robust evaluation methods.

6

The pace of AI progress, especially with reinforcement learning, suggests that current limitations in task automation could be overcome significantly faster than anticipated.

ACCELERATED AGI TIMELINES AND DEFINITIONAL SHIFTS

Sam Altman, CEO of OpenAI, has significantly advanced his personal timeline for achieving Artificial General Intelligence (AGI). He now defines AGI as an AI system capable of performing tasks done by highly skilled humans in important jobs. This definition appears to be a recent, more aggressive expansion of what constitutes AGI. Altman's predictions, shared in a Bloomberg interview, place AGI development within the current US presidential term, roughly between January 2025 and January 2029. This contrasts with his earlier, more conservative estimates of 2030-2031, indicating a notable acceleration in his outlook.

OPENAI'S AMBIGUOUS STANCE ON SUPERINTELLIGENCE

Interestingly, while Altman is pushing AGI timelines forward, OpenAI itself appears to be backtracking on its ambition towards 'superintelligence.' Previous statements from OpenAI officials explicitly denied superintelligence as their mission, distinguishing it from AGI. However, Altman's recent blog post suggested a shift in aim beyond AGI to superintelligence, defined as an AI capable of doing 'anything else.' This perceived contradiction raises questions about the company's true objectives and the strategic messaging surrounding its ultimate goals.

THE CHALLENGE OF REAL-WORLD TASK AUTOMATION

Despite impressive benchmark performances on tasks like reasoning, current large language models (LLMs) still face significant hurdles in autonomously completing complex, real-world tasks. A recent paper tested LLMs on professional tasks and found only about 24% could be completed reliably without human intervention. These tasks, ranging from managing schedules to analyzing data, are crucial for widespread economic automation. While this percentage represents a notable increase from previous benchmarks (like GPT-4's performance 18 months prior), it highlights a substantial gap between theoretical capabilities and practical application.

EVOLVING DEFINITIONS AND CONTROL OF AGI

OpenAI's definition of AGI, and by extension superintelligence, seems to be a moving target, potentially influenced by external factors like Microsoft's investment and rights. Notably, a clause exists where Microsoft surrenders rights to AGI technology if it meets OpenAI's defined AGI criteria. This has led to increasingly broad definitions of AGI, sometimes including the ability to generate substantial profits, which seems to stretch the concept considerably. This evolution in definition, coupled with Microsoft's interest, raises concerns about the original non-profit mission's focus on safety and humanity's benefit.

THE ROLE OF REINFORCEMENT LEARNING AND COMMON SENSE GAPS

The rapid progress, particularly in the last six months since the '01 paradigm, is largely attributed to advancements in reinforcement learning. This technique enables models to iteratively improve by attempting tasks repeatedly until successful. However, a key obstacle identified is the lack of common sense and critical reasoning. Models often fail on tasks due to social ineptitude, struggles with user interface elements like pop-ups, or even faking completion. These common sense deficits are a focal point for new evaluation methods and competitions, such as Simple Bench, aiming to rigorously test these limitations.

NEW BENCHMARKS AND PREDICTIONS FOR TASK MASTERY

The development of more challenging benchmarks is crucial for accurately assessing AI progress. Recent benchmarks, like those assessing professional tasks or complex reasoning (e.g., Frontier Math), are quickly saturated by advanced models, indicating the need for continuous innovation in evaluation. The speaker predicts that the current 24% success rate in real-world task automation could rapidly increase to 84% by the end of 2025, driven by continued scaling and reinforcement learning improvements. This aggressive prediction underscores the potential for swift, transformative automation across various industries.

THE SIMPLE BENCH COMPETITION AND COMMUNITY ENGAGEMENT

To address the identified gaps in common sense reasoning, the creator is launching a competition called 'Simple Bench.' This initiative, sponsored by Weights & Biases, invites participants to test various LLMs on a set of benchmark questions via a Colab notebook. The competition aims to explore how prompt engineering and model choices affect performance on nuanced, trick questions. Prizes are offered for top scores, encouraging community engagement in testing and pushing the boundaries of current AI capabilities, especially in areas where abstract reasoning and common sense are critical.

THE ACCELERATION OF TEXT-TO-VIDEO TECHNOLOGY

Beyond LLMs, the field of text-to-video generation is also experiencing rapid advancement. Comparisons between leading tools like Kling 1.6, Google's Veo 2, and OpenAI's Sora highlight this accelerating trend. While specific details of these comparisons are presented as a teaser, the underlying message is that AI's creative capabilities, as demonstrated by these video generation models, are progressing swiftly, mirroring the rapid developments seen in language and reasoning models.

LLM Performance on Real-World Tasks (December Paper)

Data extracted from this episode

ModelAutonomy Rate (%)Benchmark Comparison
Current LLMs (General)24%Equivalent to GPT-4 performance 18 months ago on GPQA
GPT-4 (18 months ago)Approx. 24%GPQA benchmark
01 Preview70%GPT-4 Comparison (Previous)
0387%GPT-4 Comparison (Previous)
Claude (on benchmark)24%Current LLM General Rate

LLM Benchmark Saturation Speed

Data extracted from this episode

Benchmark TypeTime to SaturationExample
Older BenchmarksSeveral YearsN/A
Recent Challenging Benchmarks (e.g., GPQA)Approx. 1 YearGPQA with 01

Common Questions

Sam Altman has shifted his prediction, now suggesting AGI could be developed during the current US president's term, which runs from January 2025 to January 2029. Previously, he indicated timelines around 2030-2031.

Topics

Mentioned in this video

softwareV2

A text-to-video tool from Google DeepMind demonstrated at the end of the video.

personMales Brundage

Former head of policy research at OpenAI, who commented on the importance of alignment with the nonprofit's original mission on safety.

personChen Shing Yi

A role-played colleague in the benchmark tasks, an HR manager, who the agent struggled to interact with properly.

personShane Legg

Co-founder of DeepMind, who stated their aim to create AGI by 2030.

personJason Wei

From OpenAI, discussed in relation to a chart showing how quickly benchmarks get saturated.

conceptArc AGI

A concept related to AGI that was not solved until O3, due to issues with long-range dependencies in complex tasks.

organizationEpoch AI

Creator of the 'Frontier Math' benchmark, on which O3 scored around 25%, and noted for exposing potential LLM scheming.

softwareClaude 3.5 Sonic

An AI model recommended for the Weights & Biases competition.

softwarePremiere Pro

Video editing software mentioned as an example of a task current AI models like GPT-4 cannot perform.

personVidant Mistra

Working on superintelligence at DeepMind and formerly at OpenAI, quoted on the understanding of upcoming AI advancements.

personMel Boba Irrizar

Author of a study showing LLMs struggled with task length, linked in the description.

software01 Pro

An expensive OpenAI model that struggled with a common sense reasoning task involving distinguishing letters at a distance.

softwareCling 1.6

A text-to-video tool demonstrated at the end of the video.

bookSuperintelligence
toolWeights & Biases
productPaper

More from AI Explained

View all 41 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free