How has computing changed with the advent of AI?

Computing has fundamentally shifted from pre-recorded content to dynamically generated, contextually relevant content. AI has unlocked new capabilities, allowing computers to respond to intention rather than just explicit instructions, and enabling applications like real-time generation and advanced reasoning.

How should education adapt to the rapid advancements in AI?

Education needs to integrate AI into the curriculum, not just for learning about AI, but for using AI as a learning tool. Traditional textbooks struggle to keep pace with real-time knowledge generation; a blend of first principles and AI-assisted learning is likely the future.

What are Jensen Huang's thoughts on open-source AI models?

While NVIDIA utilizes frontier proprietary models like OpenAI and Anthropic extensively, Huang emphasizes the importance of open models for democratizing AI, fostering innovation in diverse domains, and ensuring AI safety through transparency and public scrutiny.

Why is AI safety linked to open-source development?

AI systems must be open to be safe and secure. A black box system cannot be defended against or secured effectively. Transparency allows for interrogation by researchers and the public, which is crucial for understanding and mitigating potential risks from advanced AI.

What are the key architectural advancements NVIDIA is focusing on?

NVIDIA is designing systems like Hopper for pre-training, Grace Blackwell for inference with high memory bandwidth (MVLink 72), and Vera Rubin for agents, with a focus on low-latency CPU interaction and continuous processing patterns.

How is compute utilization measured and improved?

Metrics like Model Flops Utilization (MFU) are used, but high MFU isn't always the goal. The focus should be on overall performance and avoiding bottlenecks. Over-provisioning can be necessary to handle spiky workloads, but efficient architecture and careful system design are key.

What is the projected energy demand for future computing?

Future computing requires vastly more energy, potentially thousands of times current levels, driven by continuous, generative AI. NVIDIA is focusing on energy efficiency through code design, while stressing the need for investment in sustainable energy sources and grid upgrades.

What advice does Jensen Huang give students about career choices?

He advises against solely pursuing passion, emphasizing that embracing struggle and even suffering can build resilience essential for facing life's challenges. Doing one's best in any job, even a difficult one, is crucial for character development.

How should the US approach global competition in the tech industry?

Huang argues against ceding market share to competitors and advocates for fighting for market position. He believes depriving certain countries of general-purpose computing is counterproductive and that the American technology industry must remain strong and competitive globally.

What is the biggest strategic mistake NVIDIA made early on?

Shifting resources to mobile device development, while initially successful, ultimately led to being locked out of the crucial modem market during the 3G to 4G transition. This taught a valuable lesson about focusing on core strengths and avoiding marginal value propositions.

How can universities address the lack of massive-scale compute for research?

Universities like Stanford need to change their budgeting and computing strategies to aggregate funds and build campus-wide supercomputers, similar to past initiatives. It requires a significant investment, roughly a billion dollars, to provide researchers with access to AI supercomputing.

Key Moments

Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

Stanford Online

Education7 min read69 min video

May 13, 2026|34,869 views|1,344|87

Stanford Stanford Online Artificial Intelligence AI

Save to Pod

Key Moments

TL;DR

NVIDIA's Jensen Huang argues that computing is undergoing its first fundamental reinvention in 64 years, driven by AI's shift from pre-recorded execution to real-time generation, leading to unprecedented performance gains and a complete restructuring of the industry.

Key Insights

Computing has fundamentally changed for the first time in over 60 years, shifting from pre-recorded execution to real-time generation, making it contextually relevant and responsive to user intention.

NVIDIA's extreme co-design approach across chips, compilers, networks, and systems has yielded a 1 millionx performance increase over 10 years, compared to a potential 10x from traditional Moore's Law scaling.

Open-source models are crucial for AI safety and security, as transparent systems allow for interrogation and defense against potential threats, unlike opaque black boxes.

The focus in AI is shifting from training to inference, especially for agentic systems, requiring new architectures that support long-term memory access and low-latency tool execution.

Energy efficiency is paramount, with NVIDIA improving tokens per watt by 50x, and the future of computing requires a thousand-fold increase in energy, necessitating investment in sustainable energy sources.

Building supercomputer-scale compute infrastructure at universities is essential, requiring a shift in budgeting and aggregation of resources, potentially involving billion-dollar investments.

The fundamental shift from pre-recorded to generated computing

Jensen Huang from NVIDIA posits that computing is undergoing its most significant transformation in 64 years, moving beyond the fixed architectural models established by systems like the IBM System/360. The core change lies in the shift from pre-recorded, static content to real-time generation. This allows for computation that is not only contextually consistent and relevant but also responsive to user intent rather than solely explicit instructions. This fundamental change impacts every layer of the technology stack, from software development methodologies and network infrastructure to the very nature of applications, as exemplified by advancements in areas like autonomous vehicles, which were previously intractable. This evolution has been propelled by breakthroughs in deep learning and artificial intelligence. Huang highlights that generative AI, beyond creating images and text, has unlocked the AI's ability to 'think' and reason. This progression, spurred by models like GPT, indicates that 'thinking' is essentially generating internal and external tokens. The implications are profound: computing is no longer just on-demand but increasingly continuous and agentic, demanding a re-evaluation of cloud services, personal computing, and the overall system architecture.

Extreme co-design as the engine of AI performance

Huang introduces the concept of 'extreme co-design' as NVIDIA's strategy to achieve unprecedented performance gains in the age of AI. This approach involves simultaneously optimizing hardware (CPUs, GPUs, networking, storage), compilers, frameworks, and algorithms. He contrasts this with the historical practice of specializing in individual components, which, while innovative, did not yield the same systemic improvements. The impact of extreme co-design is starkly illustrated by performance metrics. While traditional Moore's Law, underpinned by Dennard scaling, offered a 10x improvement in processing power over a decade, NVIDIA's co-design approach has resulted in a staggering 1 millionx performance increase over the same period. This exponential leap in computational power allows AI researchers to consider processing vast amounts of global data, as seen with the ambition to feed the entire internet into AI models. This acceleration fundamentally reshapes what is possible in computing, opening up infinite opportunities and transforming societies, akin to the societal changes that would occur if travel speeds approached the speed of light.

The evolution of education in the AI era

Huang emphasizes that education must adapt to the rapid pace of AI development. He argues that traditional textbooks, which require years to produce, are insufficient for keeping pace with real-time knowledge generation. Therefore, curricula should integrate AI not just as a subject but as a tool for learning. He shares his personal experience using AI as a 'super researcher' to read, summarize, and interact with academic papers, highlighting its potential to augment human learning. While acknowledging the enduring value of first principles and foundational knowledge, Huang stresses the importance of contemporary, contextually relevant learning. He likens this hybrid approach to his own experience at Stanford, balancing theoretical learning with practical industry work. The synergy between understanding fundamental principles and leveraging real-world AI tools offers a more effective educational pathway for students entering the AI-driven workforce.

Open source, AI safety, and the democratization of AI

The discussion on open source versus proprietary software delves into NVIDIA's stance on AI safety and accessibility. While NVIDIA utilizes cutting-edge proprietary models like those from OpenAI and Anthropic for its internal development due to their superior performance and continuously improving cloud infrastructure, Huang champions the development and use of open models. He asserts that for AI to be safe and secure, it must be open, allowing for interrogation and defense against potential threats, unlike opaque 'black box' systems that cannot be truly secured. NVIDIA's investment in open models is driven by a desire to democratize AI capabilities across various domains. They are developing foundation models in areas such as language (Neuron), biology (Bioneo), autonomous vehicles (Alpamo), robotics (Groot), and climate science. This initiative aims to provide scientists and developers in these fields with the foundational technology needed to build advanced AI applications, thereby activating entire industries and ensuring that AI advancements benefit a wider range of societies and languages, especially those with smaller user bases that might otherwise be overlooked by commercial ventures. The goal is to enable fine-tuning of these models for specific languages and applications, leading to more effective and efficient AI systems, like Alpamo for self-driving cars, which requires less training data by incorporating human priors and reasoning.

Rethinking compute metrics and resource utilization

Huang addresses the concept of 'Model Flops Utilization' (MFU) and argues that while it's a metric, it can be a misleading indicator of true efficiency. He explains that high MFU can sometimes result from over-provisioning resources to ensure performance during peak demand, leading to idle capacity at other times. The true measure of compute efficiency, he suggests, lies beyond raw FLOPS and should focus on aspects like tokens per watt, especially for applications like large language models where inference, not just pre-filling, is critical. He highlights that optimizing compute involves balancing various system resources like memory bandwidth, capacity, and network interconnects. The challenge for open ecosystems, lacking the tight vertical integration of companies like NVIDIA, is to improve utilization. NVIDIA's focus on developing advanced interconnects, exemplified by NVLink72 in the Grace Blackwell system, aims to provide massive aggregate bandwidth essential for efficient token generation, even with low MFU during decode. This shift in focus from raw computational power to efficient resource utilization and multi-domain performance is critical for developing future AI systems.

Architectural evolution for agentic systems and energy challenges

The conversation turns to the future of computing architectures designed for 'agents'—systems that continuously operate and perform tasks. Huang introduces the Vera Rubin system, designed for agentic workloads. This architecture prioritizes loading significant amounts of 'long-term memory' into storage that can directly communicate with the GPU, minimizing data copying. It also emphasizes low-latency CPUs for executing tools invoked by the AI, preventing the multi-billion dollar GPU system from being bottlenecked. Looking further ahead, the 'Feynman' architecture is hinted at as the next evolution, likely focused on systems of agents and sub-agents. Beyond architecture, energy consumption is identified as a major bottleneck. NVIDIA is addressing this through improved energy efficiency, achieving a 50x improvement in tokens per watt, and anticipates needing potentially a thousand times more energy for future computing needs. This necessitates a significant investment in sustainable energy sources, a market trend now strong enough to drive investment without subsidies. Huang also touches upon the critical need for universities to invest in large-scale, shared compute infrastructure, like billion-dollar supercomputers, to support research and innovation, suggesting that endowments could be reallocated for this purpose.

Strategic lessons from NVIDIA's journey and future forecasting

Huang reflects on NVIDIA's history, sharing lessons learned from early mistakes. He describes their first-generation products as having 'completely wrong' technical choices, yet this failure paradoxically led to strategic genius by forcing a re-evaluation of market approach and resource allocation. A significant strategic misstep identified was diverting resources to mobile device development, which, despite becoming a billion-dollar business, ultimately led to being locked out of the crucial 3G to 4G modem transition. However, the expertise gained in low-power efficiency from this venture was redirected to the then-nascent field of robotics. When forecasting the future, Huang advocates for a process of observing trends, reasoning back to first principles, and asking critical 'so what?' questions. This iterative process involves evaluating the significance of breakthroughs (like deep learning and AlexNet), their potential reach, and their implications for computing. This leads to building a mental model of the future, identifying where NVIDIA can best position itself, and working backward from there. He emphasizes managing opportunity cost and increasing optionality by making smart strategic decisions, acknowledging that while predictions may not be perfectly accurate, a clear direction based on rigorous reasoning is key to navigating uncertainty and building successful companies.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Concepts

●People Referenced

Moore's Law vs. NVIDIA's Co-design Performance Scaling

Data extracted from this episode

Metric	Traditional Moore's Law (10 years)	NVIDIA Co-design (10 years)
Performance Increase	10x	1,000,000x (approx.)
Underlying Principle	Dinar Scaling	Extreme Co-design Across Stack

Common Questions

Code design refers to the process of understanding algorithms, systems, compilers, frameworks, and chip architecture simultaneously to optimize all components together. This approach, pioneered by NVIDIA, yields significantly greater performance gains compared to optimizing each component individually.

Topics

Ai Safety AI & Machine Learning Technology & Innovation Large Language Models System Design Deep Learning AI Hardware GPU Architecture Compute Infrastructure Future Of Computing

Mentioned in this video

Concepts

Model Flops Utilization

A metric used to measure compute utilization, discussed in terms of its limitations and how high MFU is not always the primary goal.

AMD's Law

Mentioned in the context of provisioning compute resources to avoid bottlenecks, suggesting a principle of resource management.

Products

H100 Hopper

NVIDIA's advanced GPU architecture, noted for its bandwidth and architecture beyond just its floating-point performance.

Grace Blackwell

NVIDIA's generation of rack-scale computers, featuring MVLink 72, designed for inference and large language models.

Thor

The chip used in mobile devices, which is the great-grandson of the chip that NVIDIA developed for mobile, demonstrating technological lineage.

People

Jensen Huang

CEO of NVIDIA, discussing industry shifts, co-design, AI development, and leadership.

Companies

OpenAI

A key provider of large language models used by NVIDIA, highlighting its importance in current AI development and engineering support.

Anthropic

One of the major providers of large language models that NVIDIA utilizes extensively for its engineers.

Qualcomm

Mentioned as the leader in 3G to 4G modem technology, which blocked NVIDIA from the mobile phone market during that transition.

NVIDIA

Company discussed extensively as a leader in computing, AI, and GPUs, focusing on their co-design approach and future architecture.

AMD

Mentioned as a company where Jensen Huang worked and designed microprocessors, offering insight into practical vs. theoretical design.

GitHub

Mentioned as a platform where open-source software can be downloaded, contrasting it with the performance of frontier AI models.

Media

DUNE

Referenced as a science fiction example to illustrate fears about AI's potential negative impact, contrasting with a more optimistic viewpoint.

Software & Apps

GPT

Mentioned as a key development that enabled AI to think and generate tokens, marking a significant shift in computing.

Flyman

A future system likely related to NVIDIA's agentic computing vision, possibly a successor or evolution of Vera Rubin.

Vera Rubin

NVIDIA's next-generation compute platform designed specifically for agents, featuring new CPU designs for low-latency tool use.

AlexNet

A neural network model that significantly advanced computer vision capabilities, cited as a 'big deal' in the history of deep learning.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free