Key Moments

2024 Year in Review: The Big Scaling Debate, the Four Wars of AI, Top Themes and the Rise of Agents

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read112 min video
Jan 1, 2025|3,832 views|89|16
Save to Pod
TL;DR

2024 AI Recap: Scaling debate, AI wars (data, multi-modality), agents, and new benchmarks defined.

Key Insights

1

The AI engineering field has rapidly matured, with a growing demand for skilled engineers to productionize research.

2

A significant debate emerged regarding the limits of scaling large language models, with consensus shifting towards the need for new approaches beyond just larger pre-training.

3

The "Four Wars of AI" (data, autonomy, multimodality, and inference) characterized the competitive landscape, highlighting shifts in market share and emerging capabilities.

4

The rise of AI agents and their integration into workflows is a major theme, with significant progress and anticipation for their widespread adoption in 2025.

5

The cost of AI intelligence has dramatically decreased, with significant improvements in efficiency and pricing, especially for smaller models and optimized inference.

6

New benchmarks and evaluation metrics are constantly being developed to keep pace with the rapid advancements in AI capabilities, particularly in areas like reasoning and multimodal understanding.

THE EVOLUTION OF AI ENGINEERING AND THE SCALING DEBATE

The podcast celebrates its 100th episode, reflecting on the explosive growth of AI engineering as a field. Initially a niche concept, AI engineering has become a recognized discipline, evidenced by its placement on Gartner's hype cycle and its prominence in industry discussions. A major theme of 2024 was the "scaling debate," questioning whether simply increasing model size and compute is still the most effective path forward. Several prominent researchers, including Ilia Sutskever, suggested that pre-training and data scaling might be hitting a wall, signaling a potential shift towards more efficient training methods and inference optimization. This marked a departure from the previous year's focus on raw scaling.

THE FOUR WARS OF AI: DATA, MULTIMODALITY, AND MARKET SHIFTS

The year was defined by several key competitive fronts, dubbed the 'Four Wars of AI.' The "data war" saw debates around data quality, synthetic data generation, and the ethical implications of using copyrighted material for training. The "multimodality war" heated up with significant advancements in video generation (Sora, Gen-2), image editing, and the integration of various modalities within single models (like Gemini 2.0). Market share also saw a notable shift, with OpenAI's dominance being challenged by Anthropic and Google's Gemini, particularly in the lower-cost inference tiers. The rise of smaller, more efficient models from large labs also became a significant trend, countering expectations of open-source dominance in this area.

THE ASCENSION OF AI AGENTS AND THEIR INTEGRATION

AI agents emerged as a central focus for 2025, building on the discussions from 2024. While some predicted 2024 to be the year of agents, the consensus is that their widespread productionization and adoption will truly ramp up in the coming year. Key challenges and research areas for agents include learning from the environment, extracting implicit business processes, and developing better instruction-following capabilities. The development of robust agent tooling, such as specialized SDKs, memory systems, and code interpretation capabilities, is seen as crucial for their success. Companies like Stripe and DeepMind are actively developing agentic systems, signaling strong industry belief in their potential.

INFERENCE OPTIMIZATION AND THE "GPU RICH" ECONOMY

The economics of AI shifted significantly, with a dramatic decrease in the cost of inference. While startups that relied on massive GPU clusters (GPU Rich) faced funding challenges, the "GPU Ultra Rich" labs continued massive investments. The efficiency gains have made AI capabilities more accessible, driven by optimized models and hardware. This trend also impacts the GPU market, with prices stabilizing as demand shifts towards more efficient solutions. The debate also touched on how consumers and businesses will access and utilize these capabilities, with a growing emphasis on on-device models and cost-effective cloud solutions.

ADVANCEMENTS IN BENCHMARKING AND EVALUATION

As AI capabilities rapidly evolve, so too do the methods for evaluating them. The year saw a shift away from older benchmarks like MMLU towards more specialized and rigorous evaluations in areas like reasoning, coding, and multimodal understanding. New benchmarks like SweetBench, LifeBench, and Amy emerged, reflecting the cutting edge of AI research. ThedA discussion highlighted the saturation of some benchmarks and the need for continuous innovation in evaluation methodologies to accurately capture the progress of frontier models and the narrowing gap between open-source and closed-source AI.

EMERGING CAPABILITIES AND THE FUTURE FRONTIER

The discussion explored the landscape of emerging AI capabilities, categorizing them into 'mature,' 'emerging,' and 'frontier.' Mature capabilities include general knowledge, improved long-context windows, and robust code generation. Emerging areas like vision-language models, real-time transcription, and sophisticated thinking processes are becoming more integrated into products. Frontier capabilities, such as advanced real-time voice interaction, on-device models, and more complex multimodal integration (video-to-audio sync), are on the horizon. The year also saw continued progress in areas like synthetic data, state-space models, and the development of robust tooling for agents, setting the stage for further breakthroughs.

LLM Performance and Pricing Evolution (2023-2024)

Data extracted from this episode

ModelLaunch PeriodElo Score (Approx.)Price per Million Tokens (Approx.)Order of Magnitude Improvement (from Jan 2024)
GPT-4 2023Early 20231175$40-$50Baseline
Claude 3 HaikuMarch 20241175$0.502 orders
Gemini 1.5 ProJuly/August 2024 (price cut)1250+$51 order
Amazon Nova (Pro, Light, Micro)Recent1200-1300$0.0753 orders

Common Questions

The 'AI engineer' role has gained significant traction, moving from a niche concept to a recognized field now topping Gartner's hype curve. It's defined by applied AI, using models in production without necessarily needing a PhD, focusing on integrating research findings into practical applications.

Topics

Mentioned in this video

People
Jeff Bezos

Backed Perplexity AI, an important endorsement from a tech luminary who historically backed Google, suggesting Perplexity's significant potential.

Jensen Huang

NVIDIA's CEO, praised XAI for efficiently spinning up a large GPU cluster, showcasing the trend of GPU-rich investments.

Andy Kaczynski

Launched the Kaczynski Prize with a metric similar to SweetBench but arguably more useful, showing attempts to create better evaluations for AGI.

Jürgen Schmidhuber

OG in AI and creator of the LSTM, cited as another prominent figure stating that pre-training scaling has hit a wall or run into a different kind of wall.

Dan Grosa

Seems to have become the full-time CEO of XAI, focusing on shipping a single path to superintelligence, contrasting with OpenAI's approach of intermediate products.

Ilya Sutskever

OpenAI's chief scientist who publicly stated that pre-training and data scaling have hit a wall, supporting the 'no scaling' argument in a debate.

Lee Laoped

Author of an essay on the trillion-dollar cluster, contributing to the discussion on the massive GPU investments by major AI labs.

Karina Nguyen

The person reportedly responsible for Artifacts and ChatGPT Canvas, who moved from Anthropic to OpenAI, a rare 'reverse move' in the industry.

Sam Altman

CEO of OpenAI, mentioned discussing GPT-5 in January, but ultimately OpenAI shipped other models like GPT-4o instead that year.

Scarlett Johansson

Her voice was controversially mimicked by OpenAI's 'Sky Voice' in GPT-4o demos, leading to public backlash and the feature's removal.

Companies
Morph Labs

Launched a 'time travel VM,' addressing the need for statefulness in AI agents, allowing unwinding or forking to explore different execution paths.

Ramp

A financial software company whose data is used to attribute market share shifts among large language models, indicating OpenAI's initial dominance and subsequent decline in market share.

CodeSandbox

A platform for front-end development, acquired by Netlify, and now offering its capabilities as an API, further enabling code interpreting for AI.

Anthropic

A frontier AI lab, part of the three-horse race (with Gemini and OpenAI), which has aggressively gained market share, particularly with the launch of Claude 3 and 3.5 Sonnet.

Cerebras

A company happy to serve large models like LLaMA 405B on their super large chips, though custom use is constrained.

DeepMind

Highlighted for its extensive background work in video modeling, including Genie, Gen 2, and VideoPoet, giving it an advantage in world modeling compared to other labs.

Sunoo

A GPU-poor startup that rated as one of the fastest-growing companies, achieving 0 to $20 million ARR by training on Modo, showcasing success without owning massive GPU infrastructure.

MosaicML

Acquired by Databricks for stock, initially valued around $2 billion; its valuation is now estimated to be significantly higher.

Modo

The platform Sunoo uses for its training, demonstrating how startups leverage GPU clouds to become successful without direct GPU ownership.

Databricks

A company that made the largest venture round in history, valuing it at $10-10.10 billion, and later acquired Mosaic for over $2 billion, demonstrating significant investment and consolidation in the AI/data space.

Software & Apps
Llama.cpp

An open-source project by Georgi Gerganov that created bottom-up standardization, demonstrating community-driven protocol development in AI.

Claude 3 Haiku

An Anthropic model that achieved the same Elo as GPT-4 2023 but at a drastically reduced price of 50 cents per million tokens, representing a two-order-of-magnitude improvement.

Suno

An AI music generation tool that was highly praised and used for podcast intro songs, highlighting the creative applications of AI.

Hacker News

Mentioned as a platform where the concept of AI engineering was initially met with skepticism, serving as an indicator of its growing acceptance elsewhere.

Sora

A text-to-video model that was recently released by OpenAI, generating significant excitement but also highlighting the challenges in access and stability.

LangChain

A framework for developing applications powered by language models, noted for its continued growth in downloads and usage, distinguishing itself from projects with high stars but low practical adoption.

AutoGPT

A rapidly growing GitHub project known for over-promising generality (e.g., 'make me money'), leading to broad interest but low usage due to lack of focus and execution challenges.

ChatGPT Canvas

A document editing environment by OpenAI, released as part of their 12 Days of Shipmas, which supersedes Code Interpreter by allowing code writing and execution with better AI integration.

Gemma

A small foundation model from Google's Gemini, part of the 1 to 5 billion parameter size focus of small models.

Recraft V3

An image model that unexpectedly rose to be a top performer in the image arena by Artificial Analysis, surpassing established models like Flux 1.1.

CrewAI

An agent framework mentioned as having growing stars (GitHub likes) but flat usage, illustrating the gap between hype and practical adoption in some AI projects.

E2B

A company whose fundraising was soft-announced, operating in the code interpreting space by providing sandboxed environments for LLMs to run code.

Perplexity AI

An AI-powered answer engine, noted for having ITB (E2B) as a customer and for its maturing products that do complex tasks like producing financial charts.

Code Interpreter

An older OpenAI tool that was popular last year but has now been largely superseded by ChatGPT Canvas, which offers more advanced capabilities.

MMU Pro

A benchmark for multimodal AI models, highlighted as critical for evaluating frontier capabilities.

Gemini Nano

Google's on-device AI model, coming to Chrome with feature flags, indicating a push for GPU-poor friendly, local AI capabilities.

Magic.dev

Made waves teasing a 100 million-token model, contributing to the trend of expanding context windows in LLMs.

Stripe Agent Toolkit

An SDK wrapper on the Stripe API, intended to support agents, demonstrating that even non-AI companies are building tools for agent integration, indicating demand and belief in agents.

ChatGPT

A popular conversational AI model, increasingly viewed as a robust platform for AI agent development due to new features like Canvas and voice mode.

OpenRouter

A platform mentioned as an indicator of Gemini Flash's market share, with 50% of its requests going to Gemini Flash due to its aggressive pricing.

Apple Foundation Models

Foundation models from Apple, around 3 billion parameters, categorized as small models.

Grok's Aurora

Grok's own image generation model, launched after an initial partnership with Black Forest Labs; marks their foray into proprietary image generation.

LlamaIndex

A data framework for LLM applications, observed to be consistently growing in downloads, indicating real usage and commercial product stickiness.

Devin

An AI coding agent launched in March with a highly effective PR campaign but faced backlash over video realism and took 9 months to reach general availability, raising questions about its deliverability.

JWS

DeepMind's code agent, complementing their browser agent and other AI initiatives, signaling their focus on sophisticated agent development.

Anthropic's Model Context Protocol

A memory implementation released by Anthropic with only 300 lines of code, simple but effective, suggesting that core memory components might be handled by large labs.

ComfyUI

Another example of a community-driven standardization project from 'Comfy Anonymous,' highlighting alternative paths to standardization beyond large labs.

Apple Intelligence

Apple's AI rollout for iPhones, a 3B parameter foundation model running locally with hot-swappable LoRAs, demonstrating a focus on on-device AI.

RWKV

A type of state-space model that is being rolled out in Windows, contributing to the trend of on-device AI and alternative model architectures.

Claude 3

Anthropic's model release in March, which significantly shifted market share from OpenAI and established Claude as a strong frontier model family.

ChatGPT memory

Released in February, allowing ChatGPT to remember user conversations, indicating early steps toward better context retention.

Dropbox Dash

A search tool from Dropbox with Google Drive integration, highlighting how traditional cloud storage providers are evolving into AI-powered search solutions.

Amazon Nova

Amazon's series of LLMs (Pro, Light, Micro) that are offering efficient frontiers for intelligence levels between 1200 and 1300 Elo, significantly driving down the cost of intelligence.

Artifacts App

Mentioned as a code-oriented equivalent to ChatGPT Canvas, designed for code interpretation and execution.

Udio

An AI music generation tool, similar to Suno, used to create songs and celebrated for its creative capabilities, demonstrating AI's impact on music production.

Llama 3

An open-source frontier model released in April, delivering on expectations with its 8B and 70B variants, making high-quality models accessible.

ChatGPT Voice Mode

A generally available feature in ChatGPT that allows users to converse with the AI using voice, highlighting the maturity of voice interaction in AI.

GPT-4o

Released in May, an 'omni-model' with native vision and voice capabilities, highly impactful for its efficiency and multimodal demos, despite the 'Sky Voice' controversy.

Project Mariner

DeepMind's browser agent, a 'computer use type thing,' demonstrating their active development in agentic capabilities.

NotebookLM

A product by Google, cited as one of the most popular podcast episodes, praised for its timeliness and excellent guests, achieving widespread social media attention.

More from Latent Space

View all 124 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free