Has the focus of AI compute shifted from training to inference?

Yes, the shift is towards inference. While traditionally associated with pre-training, scaling laws now encompass the entire model lifecycle, including post-training, synthetic data generation, and actual product usage, all of which heavily rely on inference workloads.

What are the biggest challenges in sourcing and building large-scale AI compute infrastructure?

The challenges include sourcing not just chips but the entire supply chain (memory, networking, power, cooling, data centers, land) and orchestrating these components to align in time for operational readiness. Ensuring reliable delivery and engineering systems that function at scale are critical.

What are the societal implications of massive AI data centers?

Large data centers consume significant grid power, potentially causing energy fluctuations and blackouts. OpenAI focuses on designing systems to minimize collateral damage to existing infrastructure and exploring alternative energy sources like natural gas and nuclear power.

How is OpenAI ensuring it has a sufficient compute advantage?

This advantage is demonstrated by the ability to serve models like GPT-4.5 without token limits and offering more generous token allowances for subscribers, allowing users to experiment more freely and without artificial constraints.

What does a modern agentic workload look like compared to a chatbot interaction?

Unlike chatbots which are one-off inference calls, agentic workloads involve a complex, directed acyclic graph of operations. This includes inference, tool calls, search queries, and potentially spinning up virtual machines to perform tasks, iterate, and close the loop, delivering full value.

Why is heterogenous compute (not just GPUs) needed for advanced AI workloads?

Agentic workloads require a mix of compute types. Specialized accelerators like Cerebras offer fast inference, while others might be designed for long context to hold entire projects in memory. Optimizing every part of the agentic graph requires matching the right workload component to the right compute type.

What is the biggest unsolved problem in AI infrastructure right now?

The primary bottleneck is the lack of sufficient fab capacity across logic and memory, concentrated in a few companies like TSMC, Samsung, Intel, Micron, SK Hynix. ASML, which produces the critical lithography machines, is also a key choke point.

Key Moments

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrastructure, Capstone Case

Stanford Online

Education6 min read47 min video

May 27, 2026|26,295 views|492|19

Stanford Stanford Online Artificial Intelligence AI

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

OpenAI aims for 30 GW of compute by 2030, a 20x increase over current US hyperscale plans, as inference now drives 80%+ of their compute needs.

Key Insights

OpenAI's compute capacity has tripled year-over-year for the past three years, directly correlating with revenue which is a lagging indicator of compute utilization.

Inference workloads are projected to constitute over 80% of OpenAI's future compute needs, driven by synthetic data generation, model fine-tuning (RL), and product usage like ChatGPT and Codex.

A single gigawatt of AI compute requires approximately $70 billion in spend and involves orchestrating a complex supply chain including chips, memory, networking, power, cooling, and data center infrastructure.

Agentic workloads are significantly more complex than simple chatbots, involving a directed acyclic graph of inference calls, tool uses, database queries, and environment simulations.

The US hyperscale sector is planning to build around 100 GW of compute, meaning AI compute will become a double-digit percentage of the US's total energy grid capacity.

The global supply chain for AI infrastructure is highly concentrated, with ASML's lithography machines being a critical choke point across wafer fabrication for logic and memory chips.

OpenAI's ambitious compute and revenue correlation

Sachin Katti, Head of Industrial Compute at OpenAI, highlights that OpenAI's revenue is a direct, though lagging, indicator of its compute capacity and utilization. For the past three years, the company has tripled its compute year-over-year, with revenue mirroring this growth. Katti anticipates this trend will continue, especially with the recent launch of GPT-4.5 and its increasing use cases beyond coding to general knowledge work. This strong correlation underscores the fundamental role of compute as the primary driver for frontier AI labs. OpenAI's target is an aspirational 30 gigawatts (GW) of compute by the end of the decade, split between research and product development, with no expectation of underutilization based on current trends.

The shift towards inference-driven compute

Initially, scaling laws were thought to apply primarily to pre-training. However, this has evolved to encompass the entire AI model lifecycle. Katti explains that pre-training now represents a smaller fraction, with post-training (including Reinforcement Learning), synthetic data generation, and actual product inference (like ChatGPT and Codex) becoming dominant. These latter stages are primarily inference workloads. Consequently, inference demand is rapidly increasing, and it is predicted to account for over 80% of OpenAI's future compute needs. While this shift could increase revenue per gigawatt due to higher token consumption, OpenAI's mission also focuses on making tokens cheaper and models more efficient, requiring a complex balance of hardware and software optimization.

The intricate supply chain of a gigawatt of compute

Building and sourcing compute at the scale of a gigawatt is an immense undertaking, as Katti illustrates. A single GW is equivalent to roughly half a million GPUs and represents a massive investment, estimated at around $70 billion. It requires orchestrating an entire supply chain, not just of chips, but also memory, networking, robust power generation and distribution, cooling systems, data center facilities, and land. The challenge extends beyond contract signing to ensuring suppliers deliver, engineering systems to work cohesively at scale, and maintaining operational uptime with high performance. Modern AI chips are described as 'brittle,' highly sensitive to power and cooling fluctuations, which can throttle performance significantly, adding another layer of complexity to operational management.

Agentic workloads necessitate a sophisticated compute graph

The evolution from simple chatbots to agentic AI represents a significant leap in compute requirements. While early chatbots involved a single inference call, reasoning capabilities added multiple inference nodes. Agentic AI, however, introduces a much more complex 'directed acyclic graph' (DAG). This involves a loop of operations: an inference call may trigger a tool call, a database or search query, or even spinning up virtual machines to test generated code. The AI then observes the output, reasons, iterates, and refines its approach, effectively 'closing the loop' to perform a task autonomously. This intricate process demands a more sophisticated compute infrastructure, capable of efficiently distributing and executing these interconnected steps.

Heterogeneous compute for optimized agent experiences

To achieve the desired user experience for agentic AI, where humans become the bottleneck rather than the AI's execution speed, heterogeneous compute is essential. This means moving beyond just GPUs and CPUs to a mix of accelerators tailored for specific parts of the workload. For instance, specialized accelerators could handle very fast inference for latency-sensitive tasks, while others might be designed for long-context memory to retain extensive state, crucial for tasks like coding entire projects. The goal is to match the right part of the complex agentic graph to the most efficient compute resource, optimizing both performance and cost. This push for heterogeneity also influences how compute is sourced and managed, moving towards more concentrated, large-scale deployments to leverage economies of scale.

The US energy grid and AI compute's massive demand

Katti highlights the substantial impact AI compute will have on energy infrastructure. The US hyperscale sector is planning to build approximately 100 GW of compute capacity, with OpenAI alone targeting 30 GW. This collective demand will constitute a significant double-digit percentage of the US's total energy grid capacity. Furthermore, the synchronized nature of large training jobs can cause substantial, rapid energy fluctuations (hundreds of megawatts), potentially destabilizing the grid. This unprecedented demand necessitates redesigning infrastructure to handle these loads, considering alternatives like natural gas and nuclear power, and driving innovations that could benefit society beyond AI, as these are investments that might not otherwise have occurred.

Supply chain resilience and the role of TSMC

A resilient compute supply chain is critical, and Katti emphasizes the danger of becoming 'single-threaded' on any single component or vendor. While NVIDIA currently leads, the market is expected to see greater diversity in chip suppliers. This is partly driven by workloads becoming more complex and suitable for specialized accelerators. Crucially, the allocation of wafer manufacturing capacity by TSMC, a single choke point for both logic and memory chips (alongside Samsung and Intel), will inherently lead to multiple types of chips being produced to serve various customers. For entities like Google, Amazon, and OpenAI, learning to utilize this diversity of chips is not a choice but a necessity for scaling.

AI is generating its own infrastructure and future

A deeply misunderstood aspect of AI infrastructure is its recursive nature: AI is increasingly designing the next generation of AI infrastructure. Katti notes that the current AI systems are simple, often a single compute unit with high-bandwidth memory. This is evolving towards multi-layered caching and storage, similar to CPUs historically. However, the more profound shift is using current AI models to design future chips and the software to run them. This 'recession' or recursive feedback loop aims to dramatically shorten the compute cycle time, which is currently a 3-year process for chip design. Since an AI's intelligence leads us by months or years, this AI-driven design process is seen as the only feasible way to keep pace with the accelerating demands of frontier AI models, ensuring compute development can match model advancements, especially as traditional 3-year chip design cycles feel like an eternity in the current AI landscape.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Revenue for frontier AI labs like OpenAI is largely a lagging indicator of compute utilization. Historically, when OpenAI has tripled its compute capacity year-over-year, its revenue has also tripled, a correlation they expect to continue.

Topics

Semiconductor Manufacturing AI & Machine Learning Technology & Innovation AI Infrastructure Data Center Economics Agentic AI Compute Supply Chain AI Workloads Heterogeneous Computing

Mentioned in this video

Companies

Google

Mentioned as one of the hyperscalers with significant compute build-out plans, contributing to the overall demand for energy and infrastructure.

Amazon

Mentioned as a hyperscaler with significant compute build-out plans and their own custom chips (Triton), contributing to the demand for energy and infrastructure.

ASML

A company essential for manufacturing advanced chips, identified as a single choke point in the global supply chain due to its exclusive production of critical lithography machines.

Intel

A company where Kati previously served as CTO and head of AI. Discussed for its manufacturing capabilities and the resurgence of CPUs in AI.

Anthropic

Mentioned as a competitor lab to OpenAI. The discussion touches on OpenAI's potential compute advantage over labs like Anthropic.

TSMC

Taiwan Semiconductor Manufacturing Company. Discussed for its critical role in wafer allocation and its strategy of supporting multiple customers, which necessitates diverse chip varieties.

NVIDIA

The dominant player in the GPU market, discussed in the context of market share and the overall compute landscape. Also mentioned as a potential first company to reach $10 trillion market cap.

OpenAI

The organization where Kati currently works, focused on AI research and development, and a major consumer of compute resources.

Cerebras

A company whose specialized accelerators are discussed as being suited for fast inference, and whose performance improvements highlighted other system latencies.

Concepts

Transformer

A type of neural network architecture that is fundamental to current AI models. The discussion touches on their compute requirements and the potential for future reinventing or replacing them.

Open Source Models

Models that are publicly available, contrasted with frontier models. They are seen as playing a role in distilling intelligence into more compact forms, but with a significant lead time for frontier models.

People

Sam Altman

Mentioned in relation to OpenAI's vision of widespread GPU access, suggesting that every person should have a GPU, similar to mobile phones.

Sarah Frier

Author of an article about OpenAI's compute ambitions, which included a chart showing OpenAI's compute capacity over time.

Software & Apps

Codex

An OpenAI product that is seeing increased usage beyond coding for general-purpose knowledge work, contributing to compute demand.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free