Key Moments

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Applied AI

Stanford OnlineStanford Online
Education5 min read50 min video
Jun 5, 2026|1,707 views|54|3
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

The next AI inference boom will be 1 billion times larger than today, but companies must build custom models and infrastructure or risk falling behind frontier labs.

Key Insights

1

Inference demand is projected to increase by a billion times, necessitating massive growth in compute infrastructure.

2

Currently, 95% of inference spend is on frontier models, but Baseten believes profitable, defensible companies will leverage custom-trained open-source models.

3

Open-source models are approximately 90 days behind closed-source frontier models but can be 70-90% cheaper to run.

4

Compute access is identified as the primary strategic advantage for inference, leading Baseten to shift from renting to potentially owning compute resources.

5

Baseten's inference service currently handles around 30 trillion tokens per day, exceeding the volume of OpenAI's API and Google's Gemini.

6

The cost of compute is projected to double in some cases, with a future need for Baseten equivalent to $7 billion in compute spend within two years.

The coming billion-fold increase in AI inference

The AI landscape is on the cusp of an unprecedented surge in inference demand, projected to be a billion times greater than current levels. Tuhin Srivastava, founder and CEO of Baseten, highlights this exponential growth as a fundamental shift that will redefine the technology sector. This expansion underscores the critical role of inference, described as the 'cogs of AI value being delivered,' in powering the fastest-growing AI companies globally. The trajectory suggests a need for immense scaling in infrastructure and strategic shifts in how AI models are developed and deployed to capitalize on this supercycle.

The strategic advantage of custom-trained open-source models

While 95% of current inference spending is directed towards frontier models, Baseten's core thesis is that true defensible and profitable businesses will be built on custom-trained, or post-trained, open-source models. Srivastava explains that although open-source models typically lag behind frontier models by about 90 days, they offer a cost reduction of 70% to 90%. This economic advantage becomes crucial for companies aiming for profitability and healthy gross margins, especially as they scale. Furthermore, relying solely on frontier models risks handing over valuable user data and proprietary workflows to model providers, potentially undermining a company's unique competitive edge. By owning their intelligence through custom models, companies can build a more sustainable and defensible business, particularly as their user base and operational volume grow, making this transition less of a choice and more of an existential necessity.

Navigating the cloud and performance landscape

Companies like Baseten initially adopt a multi-cloud strategy, stitching together compute from various providers (currently operating across 18-20 clouds and 87 clusters) to ensure access and resilience. This approach abstracts away the complexities of sourcing GPUs, which are notoriously scarce. While major cloud providers offer their own inference platforms, Baseten believes its specialized software stack provides significant value in performance optimization, multi-cloud reliability, and developer tooling. Many clients first try raw cloud providers or AI clouds like CoreWeave or NVIDIA Nebula but often find significant pain in building their own inference stack on top of raw compute, leading them to solutions like Baseten. The performance of custom models, when optimized for specific use cases, is expected to improve user experience and drive higher latency and reliability, rather than degrade.

The escalating compute scarcity and the shift to ownership

Compute scarcity is not a temporary issue but a compounding problem. Demand for AI inference is surging due to increasingly agentic applications and larger models. This persistent demand, combined with extended lead times for acquiring GPUs (potentially 12-15 months out), is driving Baseten to re-evaluate its strategy. A recent example highlighted a dramatic price increase for B200 Blackwell chips, with renewal costs nearly doubling. This signifies that renting compute may soon become infeasible for large-scale operations. Baseten's own inference service, processing approximately 30 trillion tokens daily, is projected to require the equivalent of 150,000 B200s in two years, translating to a staggering $7 billion in compute expenditure. To secure this necessary capacity and mitigate risks, Baseten is moving towards potentially owning compute infrastructure, a move that is also projected to be about 30% cheaper than renting at scale.

Hardware diversity: NVIDIA's ecosystem dominance

While Baseten acknowledges the promise of diverse hardware architectures like TPUs and newer 'neo' chips, NVIDIA's ecosystem remains dominant. The vast majority of Baseten's fleet runs on NVIDIA chips, largely due to the mature CUDA developer ecosystem, extensive supply chain, and strong relationships with manufacturers like TSMC. New architectures often struggle to compete with the 'all-in-one' advantage of NVIDIA's integrated hardware and software stack, particularly its reliance on CUDA for frameworks like TRTLM, VLM, and SLang. While heterogeneous architectures are likely the future, NVIDIA's current grip on the market, fueled by its established infrastructure and developer community, makes it the pragmatic choice for companies prioritizing speed and agility in the current AI race.

The future of open-source models and national security

A critical bet for Baseten is the continued viability and quality of open-source models. Currently, the leading open-source models originate from China, prompting concerns about America's position in AI development. This situation is framed not just as a competitive disadvantage but also as a potential national security issue. The high cost difference (70-90% cheaper) and speed of open-source models compared to frontier models drive their adoption, especially for companies focused on profitability and defensibility. While companies like Meta have shifted away from open-sourcing some of their latest models, there's a recognition that robust open-source development in the U.S. is essential. Investments from companies like Google (with Gemma) and NVIDIA, alongside potential government involvement, suggest a growing effort to bolster domestic open-source AI capabilities, making it a matter of inevitability for national interests.

Building a modular future for data centers

Looking beyond Baseten, Srivastava suggests that the next frontier in AI infrastructure lies in standardizing compute units through modular data centers. Drawing an analogy to shipping containers that revolutionized global trade by normalizing the unit of cargo, modular data centers aim to create a standardized 'unit of compute.' This approach could significantly accelerate data center construction and operation by simplifying design, deployment, and maintenance. The current process for building data centers is highly customized and inefficient. By creating a modular, consistent format, an 'API for compute' could emerge, fostering an industry that can scale rapidly and efficiently, addressing the immense demand for processing power in the AI era. This involves focusing on energy and power infrastructure to support the build-out.

Open Source vs. Frontier Models: Key Differences

Data extracted from this episode

FeatureFrontier ModelsOpen Source Models
Performance LagState-of-the-artApprox. 90 days behind
CostHigher70-90% cheaper
DefensibilityPotentially lower (risk of data extraction)Higher (owning own intelligence)
Development OriginLarge AI labsPrimarily China, some US investment

Base 10 Compute Offering

Data extracted from this episode

OfferingModelPricing Example (per hour)
Rented ComputeNVIDIA B200s (Blackwell)$263 (current rate)
Rented Compute Renewal (projected)NVIDIA B200s (Blackwell)$510 (projected rate)

Common Questions

Base 10 provides production inference infrastructure for AI companies. Their core offering helps these companies run highly optimized, custom AI models efficiently, focusing on performance, reliability, and a strong developer platform.

Topics

Mentioned in this video

Companies
NVIDIA

A prominent technology company specializing in GPUs and AI hardware. Mentioned as a dominant player with a strong ecosystem (CUDA) and its significant investments in open-source initiatives.

Reflection AI

An AI company that reportedly aims to release good open-source models, contributing to the growing availability and advancement of open-source AI technology.

Cruso

A company previously featured in the class, run by Chase, which discussed the economics of building and owning data centers, representing a different approach to compute infrastructure.

Nubia

An AI cloud provider mentioned as a competitor in the market for inference infrastructure, alongside others like Coriv.

Google

A technology giant mentioned for producing the open-source model 'Jamma' and investing in the AI ecosystem, supporting the development of open-source AI.

Coriv

An AI cloud provider mentioned as part of the competitive landscape founders might consider before choosing Base 10 for their inference needs.

Alibaba

A Chinese multinational technology company, mentioned as a source of leading open-source AI models, contrasting with American contributions in this area.

Base 10

Tuhin's company, focused on providing production inference infrastructure for AI companies, enabling them to run custom models efficiently and cost-effectively.

Meta

The parent company of Facebook and Instagram, mentioned for its historical shift towards not open-sourcing its latest models, impacting the open-source landscape.

OpenAI

A prominent AI research lab and deployment company, known for models like GPT. Mentioned in discussions about API pricing, open-source vs. frontier models, and its potential role in the AI ecosystem.

Oracle

A technology company with significant cloud infrastructure, mentioned in comparison to Base 10's potential cost savings from owning compute, highlighting Oracle's typical gross margins.

Anthropic

A leading AI research company developing large language models like Claude, mentioned in the context of pricing comparisons and the potential for cost savings by using post-trained open-source models.

DeepSeek

A Chinese AI company that received investment from the Chinese government, noted in the context of international efforts and investments in open-source AI.

McQuary

An Australian investment bank where Tuhin started his career after university, working on privatizing toll roads and airports in New York.

Minimax

A Chinese AI company that has developed prominent open-source models, contributing to the global AI landscape and taking a distinct market position.

Software & Apps
Opus

A term used to refer to advanced language models, possibly in comparison to GPT, highlighting the rapid development in the field and the considerations for using post-trained models.

Cursor

A company using AI, mentioned as a large-scale customer that has adopted post-trained models, highlighting the trend towards optimizing costs and performance.

Gemini

Google's family of AI models, mentioned in a comparison of token volume processed by Base 10's inference service, showing Base 10 handles a larger volume than Gemini's API.

Abridge

A healthcare company providing an ambient scribe that integrates with EMRs, converting patient interactions into clinical notes using various AI models, all run on Base 10.

AWS

Amazon Web Services, a major cloud provider mentioned as a common first choice for founders before they realize the benefits of specialized inference stacks like Base 10 offers.

Azure

Microsoft Azure, a cloud computing service provider alongside AWS and GCP, mentioned as a common choice for founders before adopting specialized inference solutions.

Whisper Flow

A company that runs custom speech-to-text models for voice typing, utilizing Base 10's optimizations and infrastructure for low latency.

GCP

Google Cloud Platform, another major cloud provider mentioned as a typical destination for founders seeking compute infrastructure.

GPT

A family of large language models developed by OpenAI, discussed in the context of why companies might choose to post-train open-source models rather than solely rely on frontier models like GPT-6.

More from Stanford Online

View all 75 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free