Key Moments
2024 Year in Review: The Big Scaling Debate, the Four Wars of AI, Top Themes and the Rise of Agents
Key Moments
2024 AI Recap: Scaling debate, AI wars (data, multi-modality), agents, and new benchmarks defined.
Key Insights
The AI engineering field has rapidly matured, with a growing demand for skilled engineers to productionize research.
A significant debate emerged regarding the limits of scaling large language models, with consensus shifting towards the need for new approaches beyond just larger pre-training.
The "Four Wars of AI" (data, autonomy, multimodality, and inference) characterized the competitive landscape, highlighting shifts in market share and emerging capabilities.
The rise of AI agents and their integration into workflows is a major theme, with significant progress and anticipation for their widespread adoption in 2025.
The cost of AI intelligence has dramatically decreased, with significant improvements in efficiency and pricing, especially for smaller models and optimized inference.
New benchmarks and evaluation metrics are constantly being developed to keep pace with the rapid advancements in AI capabilities, particularly in areas like reasoning and multimodal understanding.
THE EVOLUTION OF AI ENGINEERING AND THE SCALING DEBATE
The podcast celebrates its 100th episode, reflecting on the explosive growth of AI engineering as a field. Initially a niche concept, AI engineering has become a recognized discipline, evidenced by its placement on Gartner's hype cycle and its prominence in industry discussions. A major theme of 2024 was the "scaling debate," questioning whether simply increasing model size and compute is still the most effective path forward. Several prominent researchers, including Ilia Sutskever, suggested that pre-training and data scaling might be hitting a wall, signaling a potential shift towards more efficient training methods and inference optimization. This marked a departure from the previous year's focus on raw scaling.
THE FOUR WARS OF AI: DATA, MULTIMODALITY, AND MARKET SHIFTS
The year was defined by several key competitive fronts, dubbed the 'Four Wars of AI.' The "data war" saw debates around data quality, synthetic data generation, and the ethical implications of using copyrighted material for training. The "multimodality war" heated up with significant advancements in video generation (Sora, Gen-2), image editing, and the integration of various modalities within single models (like Gemini 2.0). Market share also saw a notable shift, with OpenAI's dominance being challenged by Anthropic and Google's Gemini, particularly in the lower-cost inference tiers. The rise of smaller, more efficient models from large labs also became a significant trend, countering expectations of open-source dominance in this area.
THE ASCENSION OF AI AGENTS AND THEIR INTEGRATION
AI agents emerged as a central focus for 2025, building on the discussions from 2024. While some predicted 2024 to be the year of agents, the consensus is that their widespread productionization and adoption will truly ramp up in the coming year. Key challenges and research areas for agents include learning from the environment, extracting implicit business processes, and developing better instruction-following capabilities. The development of robust agent tooling, such as specialized SDKs, memory systems, and code interpretation capabilities, is seen as crucial for their success. Companies like Stripe and DeepMind are actively developing agentic systems, signaling strong industry belief in their potential.
INFERENCE OPTIMIZATION AND THE "GPU RICH" ECONOMY
The economics of AI shifted significantly, with a dramatic decrease in the cost of inference. While startups that relied on massive GPU clusters (GPU Rich) faced funding challenges, the "GPU Ultra Rich" labs continued massive investments. The efficiency gains have made AI capabilities more accessible, driven by optimized models and hardware. This trend also impacts the GPU market, with prices stabilizing as demand shifts towards more efficient solutions. The debate also touched on how consumers and businesses will access and utilize these capabilities, with a growing emphasis on on-device models and cost-effective cloud solutions.
ADVANCEMENTS IN BENCHMARKING AND EVALUATION
As AI capabilities rapidly evolve, so too do the methods for evaluating them. The year saw a shift away from older benchmarks like MMLU towards more specialized and rigorous evaluations in areas like reasoning, coding, and multimodal understanding. New benchmarks like SweetBench, LifeBench, and Amy emerged, reflecting the cutting edge of AI research. ThedA discussion highlighted the saturation of some benchmarks and the need for continuous innovation in evaluation methodologies to accurately capture the progress of frontier models and the narrowing gap between open-source and closed-source AI.
EMERGING CAPABILITIES AND THE FUTURE FRONTIER
The discussion explored the landscape of emerging AI capabilities, categorizing them into 'mature,' 'emerging,' and 'frontier.' Mature capabilities include general knowledge, improved long-context windows, and robust code generation. Emerging areas like vision-language models, real-time transcription, and sophisticated thinking processes are becoming more integrated into products. Frontier capabilities, such as advanced real-time voice interaction, on-device models, and more complex multimodal integration (video-to-audio sync), are on the horizon. The year also saw continued progress in areas like synthetic data, state-space models, and the development of robust tooling for agents, setting the stage for further breakthroughs.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Books
●Concepts
●People Referenced
LLM Performance and Pricing Evolution (2023-2024)
Data extracted from this episode
| Model | Launch Period | Elo Score (Approx.) | Price per Million Tokens (Approx.) | Order of Magnitude Improvement (from Jan 2024) |
|---|---|---|---|---|
| GPT-4 2023 | Early 2023 | 1175 | $40-$50 | Baseline |
| Claude 3 Haiku | March 2024 | 1175 | $0.50 | 2 orders |
| Gemini 1.5 Pro | July/August 2024 (price cut) | 1250+ | $5 | 1 order |
| Amazon Nova (Pro, Light, Micro) | Recent | 1200-1300 | $0.075 | 3 orders |
Common Questions
The 'AI engineer' role has gained significant traction, moving from a niche concept to a recognized field now topping Gartner's hype curve. It's defined by applied AI, using models in production without necessarily needing a PhD, focusing on integrating research findings into practical applications.
Topics
Mentioned in this video
A wearable fitness tracker, not directly mentioned in the content but provided as an example in prompt for entity extraction. No specific discussion in the transcript.
A company that announced $20 million ARR, another example of a GPU-poor startup achieving significant growth, focusing on web container technology.
NVIDIA's next-generation GPU series slated for release, expected to accelerate AI development and maintain NVIDIA's market dominance.
An open-source inference model from Fireworks, along with Quill from the Qwen team, noted as leading contenders in the inference space.
Backed Perplexity AI, an important endorsement from a tech luminary who historically backed Google, suggesting Perplexity's significant potential.
NVIDIA's CEO, praised XAI for efficiently spinning up a large GPU cluster, showcasing the trend of GPU-rich investments.
Launched the Kaczynski Prize with a metric similar to SweetBench but arguably more useful, showing attempts to create better evaluations for AGI.
OG in AI and creator of the LSTM, cited as another prominent figure stating that pre-training scaling has hit a wall or run into a different kind of wall.
Seems to have become the full-time CEO of XAI, focusing on shipping a single path to superintelligence, contrasting with OpenAI's approach of intermediate products.
OpenAI's chief scientist who publicly stated that pre-training and data scaling have hit a wall, supporting the 'no scaling' argument in a debate.
Author of an essay on the trillion-dollar cluster, contributing to the discussion on the massive GPU investments by major AI labs.
The person reportedly responsible for Artifacts and ChatGPT Canvas, who moved from Anthropic to OpenAI, a rare 'reverse move' in the industry.
CEO of OpenAI, mentioned discussing GPT-5 in January, but ultimately OpenAI shipped other models like GPT-4o instead that year.
Her voice was controversially mimicked by OpenAI's 'Sky Voice' in GPT-4o demos, leading to public backlash and the feature's removal.
Launched a 'time travel VM,' addressing the need for statefulness in AI agents, allowing unwinding or forking to explore different execution paths.
A financial software company whose data is used to attribute market share shifts among large language models, indicating OpenAI's initial dominance and subsequent decline in market share.
A platform for front-end development, acquired by Netlify, and now offering its capabilities as an API, further enabling code interpreting for AI.
A frontier AI lab, part of the three-horse race (with Gemini and OpenAI), which has aggressively gained market share, particularly with the launch of Claude 3 and 3.5 Sonnet.
A company happy to serve large models like LLaMA 405B on their super large chips, though custom use is constrained.
Highlighted for its extensive background work in video modeling, including Genie, Gen 2, and VideoPoet, giving it an advantage in world modeling compared to other labs.
A GPU-poor startup that rated as one of the fastest-growing companies, achieving 0 to $20 million ARR by training on Modo, showcasing success without owning massive GPU infrastructure.
Acquired by Databricks for stock, initially valued around $2 billion; its valuation is now estimated to be significantly higher.
The platform Sunoo uses for its training, demonstrating how startups leverage GPU clouds to become successful without direct GPU ownership.
A company that made the largest venture round in history, valuing it at $10-10.10 billion, and later acquired Mosaic for over $2 billion, demonstrating significant investment and consolidation in the AI/data space.
An open-source project by Georgi Gerganov that created bottom-up standardization, demonstrating community-driven protocol development in AI.
An Anthropic model that achieved the same Elo as GPT-4 2023 but at a drastically reduced price of 50 cents per million tokens, representing a two-order-of-magnitude improvement.
An AI music generation tool that was highly praised and used for podcast intro songs, highlighting the creative applications of AI.
Mentioned as a platform where the concept of AI engineering was initially met with skepticism, serving as an indicator of its growing acceptance elsewhere.
A text-to-video model that was recently released by OpenAI, generating significant excitement but also highlighting the challenges in access and stability.
A framework for developing applications powered by language models, noted for its continued growth in downloads and usage, distinguishing itself from projects with high stars but low practical adoption.
A rapidly growing GitHub project known for over-promising generality (e.g., 'make me money'), leading to broad interest but low usage due to lack of focus and execution challenges.
A document editing environment by OpenAI, released as part of their 12 Days of Shipmas, which supersedes Code Interpreter by allowing code writing and execution with better AI integration.
A small foundation model from Google's Gemini, part of the 1 to 5 billion parameter size focus of small models.
An image model that unexpectedly rose to be a top performer in the image arena by Artificial Analysis, surpassing established models like Flux 1.1.
An agent framework mentioned as having growing stars (GitHub likes) but flat usage, illustrating the gap between hype and practical adoption in some AI projects.
A company whose fundraising was soft-announced, operating in the code interpreting space by providing sandboxed environments for LLMs to run code.
An AI-powered answer engine, noted for having ITB (E2B) as a customer and for its maturing products that do complex tasks like producing financial charts.
An older OpenAI tool that was popular last year but has now been largely superseded by ChatGPT Canvas, which offers more advanced capabilities.
A benchmark for multimodal AI models, highlighted as critical for evaluating frontier capabilities.
Google's on-device AI model, coming to Chrome with feature flags, indicating a push for GPU-poor friendly, local AI capabilities.
Made waves teasing a 100 million-token model, contributing to the trend of expanding context windows in LLMs.
An SDK wrapper on the Stripe API, intended to support agents, demonstrating that even non-AI companies are building tools for agent integration, indicating demand and belief in agents.
A popular conversational AI model, increasingly viewed as a robust platform for AI agent development due to new features like Canvas and voice mode.
A platform mentioned as an indicator of Gemini Flash's market share, with 50% of its requests going to Gemini Flash due to its aggressive pricing.
Foundation models from Apple, around 3 billion parameters, categorized as small models.
Grok's own image generation model, launched after an initial partnership with Black Forest Labs; marks their foray into proprietary image generation.
A data framework for LLM applications, observed to be consistently growing in downloads, indicating real usage and commercial product stickiness.
An AI coding agent launched in March with a highly effective PR campaign but faced backlash over video realism and took 9 months to reach general availability, raising questions about its deliverability.
DeepMind's code agent, complementing their browser agent and other AI initiatives, signaling their focus on sophisticated agent development.
A memory implementation released by Anthropic with only 300 lines of code, simple but effective, suggesting that core memory components might be handled by large labs.
Another example of a community-driven standardization project from 'Comfy Anonymous,' highlighting alternative paths to standardization beyond large labs.
Apple's AI rollout for iPhones, a 3B parameter foundation model running locally with hot-swappable LoRAs, demonstrating a focus on on-device AI.
A type of state-space model that is being rolled out in Windows, contributing to the trend of on-device AI and alternative model architectures.
Anthropic's model release in March, which significantly shifted market share from OpenAI and established Claude as a strong frontier model family.
Released in February, allowing ChatGPT to remember user conversations, indicating early steps toward better context retention.
A search tool from Dropbox with Google Drive integration, highlighting how traditional cloud storage providers are evolving into AI-powered search solutions.
Amazon's series of LLMs (Pro, Light, Micro) that are offering efficient frontiers for intelligence levels between 1200 and 1300 Elo, significantly driving down the cost of intelligence.
Mentioned as a code-oriented equivalent to ChatGPT Canvas, designed for code interpretation and execution.
An AI music generation tool, similar to Suno, used to create songs and celebrated for its creative capabilities, demonstrating AI's impact on music production.
An open-source frontier model released in April, delivering on expectations with its 8B and 70B variants, making high-quality models accessible.
A generally available feature in ChatGPT that allows users to converse with the AI using voice, highlighting the maturity of voice interaction in AI.
Released in May, an 'omni-model' with native vision and voice capabilities, highly impactful for its efficiency and multimodal demos, despite the 'Sky Voice' controversy.
DeepMind's browser agent, a 'computer use type thing,' demonstrating their active development in agentic capabilities.
A product by Google, cited as one of the most popular podcast episodes, praised for its timeliness and excellent guests, achieving widespread social media attention.
Apple's announced secure cloud computing for its AI models, highlighting growing interest in secure and private AI inference for state-level interests.
A leaderboard or metric for language model performance, mentioned as significantly increasing across all models in 2023, indicating rapid innovation and competition.
A concept discussed for concentrating compute resources, emphasizing that one big training run is more valuable than many small ones.
A new benchmark mentioned alongside SweetBench and MMU-Pro as the latest metrics for evaluating frontier AI models.
A benchmark for AI models, especially for reasoning and coding tasks, indicating the shift in focus for evaluating advanced AI capabilities.
More from Latent Space
View all 124 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free