Key Moments

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrastructure, Capstone Case

Stanford OnlineStanford Online
Education6 min read47 min video
May 27, 2026|2,450 views|74|4
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

OpenAI aims for 30 GW of compute by 2030, a 20x increase over current US hyperscale plans, as inference now drives 80%+ of their compute needs.

Key Insights

1

OpenAI's compute capacity has tripled year-over-year for the past three years, directly correlating with revenue which is a lagging indicator of compute utilization.

2

Inference workloads are projected to constitute over 80% of OpenAI's future compute needs, driven by synthetic data generation, model fine-tuning (RL), and product usage like ChatGPT and Codex.

3

A single gigawatt of AI compute requires approximately $70 billion in spend and involves orchestrating a complex supply chain including chips, memory, networking, power, cooling, and data center infrastructure.

4

Agentic workloads are significantly more complex than simple chatbots, involving a directed acyclic graph of inference calls, tool uses, database queries, and environment simulations.

5

The US hyperscale sector is planning to build around 100 GW of compute, meaning AI compute will become a double-digit percentage of the US's total energy grid capacity.

6

The global supply chain for AI infrastructure is highly concentrated, with ASML's lithography machines being a critical choke point across wafer fabrication for logic and memory chips.

OpenAI's ambitious compute and revenue correlation

Sachin Katti, Head of Industrial Compute at OpenAI, highlights that OpenAI's revenue is a direct, though lagging, indicator of its compute capacity and utilization. For the past three years, the company has tripled its compute year-over-year, with revenue mirroring this growth. Katti anticipates this trend will continue, especially with the recent launch of GPT-4.5 and its increasing use cases beyond coding to general knowledge work. This strong correlation underscores the fundamental role of compute as the primary driver for frontier AI labs. OpenAI's target is an aspirational 30 gigawatts (GW) of compute by the end of the decade, split between research and product development, with no expectation of underutilization based on current trends.

The shift towards inference-driven compute

Initially, scaling laws were thought to apply primarily to pre-training. However, this has evolved to encompass the entire AI model lifecycle. Katti explains that pre-training now represents a smaller fraction, with post-training (including Reinforcement Learning), synthetic data generation, and actual product inference (like ChatGPT and Codex) becoming dominant. These latter stages are primarily inference workloads. Consequently, inference demand is rapidly increasing, and it is predicted to account for over 80% of OpenAI's future compute needs. While this shift could increase revenue per gigawatt due to higher token consumption, OpenAI's mission also focuses on making tokens cheaper and models more efficient, requiring a complex balance of hardware and software optimization.

The intricate supply chain of a gigawatt of compute

Building and sourcing compute at the scale of a gigawatt is an immense undertaking, as Katti illustrates. A single GW is equivalent to roughly half a million GPUs and represents a massive investment, estimated at around $70 billion. It requires orchestrating an entire supply chain, not just of chips, but also memory, networking, robust power generation and distribution, cooling systems, data center facilities, and land. The challenge extends beyond contract signing to ensuring suppliers deliver, engineering systems to work cohesively at scale, and maintaining operational uptime with high performance. Modern AI chips are described as 'brittle,' highly sensitive to power and cooling fluctuations, which can throttle performance significantly, adding another layer of complexity to operational management.

Agentic workloads necessitate a sophisticated compute graph

The evolution from simple chatbots to agentic AI represents a significant leap in compute requirements. While early chatbots involved a single inference call, reasoning capabilities added multiple inference nodes. Agentic AI, however, introduces a much more complex 'directed acyclic graph' (DAG). This involves a loop of operations: an inference call may trigger a tool call, a database or search query, or even spinning up virtual machines to test generated code. The AI then observes the output, reasons, iterates, and refines its approach, effectively 'closing the loop' to perform a task autonomously. This intricate process demands a more sophisticated compute infrastructure, capable of efficiently distributing and executing these interconnected steps.

Heterogeneous compute for optimized agent experiences

To achieve the desired user experience for agentic AI, where humans become the bottleneck rather than the AI's execution speed, heterogeneous compute is essential. This means moving beyond just GPUs and CPUs to a mix of accelerators tailored for specific parts of the workload. For instance, specialized accelerators could handle very fast inference for latency-sensitive tasks, while others might be designed for long-context memory to retain extensive state, crucial for tasks like coding entire projects. The goal is to match the right part of the complex agentic graph to the most efficient compute resource, optimizing both performance and cost. This push for heterogeneity also influences how compute is sourced and managed, moving towards more concentrated, large-scale deployments to leverage economies of scale.

The US energy grid and AI compute's massive demand

Katti highlights the substantial impact AI compute will have on energy infrastructure. The US hyperscale sector is planning to build approximately 100 GW of compute capacity, with OpenAI alone targeting 30 GW. This collective demand will constitute a significant double-digit percentage of the US's total energy grid capacity. Furthermore, the synchronized nature of large training jobs can cause substantial, rapid energy fluctuations (hundreds of megawatts), potentially destabilizing the grid. This unprecedented demand necessitates redesigning infrastructure to handle these loads, considering alternatives like natural gas and nuclear power, and driving innovations that could benefit society beyond AI, as these are investments that might not otherwise have occurred.

Supply chain resilience and the role of TSMC

A resilient compute supply chain is critical, and Katti emphasizes the danger of becoming 'single-threaded' on any single component or vendor. While NVIDIA currently leads, the market is expected to see greater diversity in chip suppliers. This is partly driven by workloads becoming more complex and suitable for specialized accelerators. Crucially, the allocation of wafer manufacturing capacity by TSMC, a single choke point for both logic and memory chips (alongside Samsung and Intel), will inherently lead to multiple types of chips being produced to serve various customers. For entities like Google, Amazon, and OpenAI, learning to utilize this diversity of chips is not a choice but a necessity for scaling.

AI is generating its own infrastructure and future

A deeply misunderstood aspect of AI infrastructure is its recursive nature: AI is increasingly designing the next generation of AI infrastructure. Katti notes that the current AI systems are simple, often a single compute unit with high-bandwidth memory. This is evolving towards multi-layered caching and storage, similar to CPUs historically. However, the more profound shift is using current AI models to design future chips and the software to run them. This 'recession' or recursive feedback loop aims to dramatically shorten the compute cycle time, which is currently a 3-year process for chip design. Since an AI's intelligence leads us by months or years, this AI-driven design process is seen as the only feasible way to keep pace with the accelerating demands of frontier AI models, ensuring compute development can match model advancements, especially as traditional 3-year chip design cycles feel like an eternity in the current AI landscape.

Common Questions

Revenue for frontier AI labs like OpenAI is largely a lagging indicator of compute utilization. Historically, when OpenAI has tripled its compute capacity year-over-year, its revenue has also tripled, a correlation they expect to continue.

Topics

Mentioned in this video

More from Stanford Online

View all 67 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free