Key Moments
Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence
Key Moments
NVIDIA's Jensen Huang argues that computing is undergoing its first fundamental reinvention in 64 years, driven by AI's shift from pre-recorded execution to real-time generation, leading to unprecedented performance gains and a complete restructuring of the industry.
Key Insights
Computing has fundamentally changed for the first time in over 60 years, shifting from pre-recorded execution to real-time generation, making it contextually relevant and responsive to user intention.
NVIDIA's extreme co-design approach across chips, compilers, networks, and systems has yielded a 1 millionx performance increase over 10 years, compared to a potential 10x from traditional Moore's Law scaling.
Open-source models are crucial for AI safety and security, as transparent systems allow for interrogation and defense against potential threats, unlike opaque black boxes.
The focus in AI is shifting from training to inference, especially for agentic systems, requiring new architectures that support long-term memory access and low-latency tool execution.
Energy efficiency is paramount, with NVIDIA improving tokens per watt by 50x, and the future of computing requires a thousand-fold increase in energy, necessitating investment in sustainable energy sources.
Building supercomputer-scale compute infrastructure at universities is essential, requiring a shift in budgeting and aggregation of resources, potentially involving billion-dollar investments.
The fundamental shift from pre-recorded to generated computing
Jensen Huang from NVIDIA posits that computing is undergoing its most significant transformation in 64 years, moving beyond the fixed architectural models established by systems like the IBM System/360. The core change lies in the shift from pre-recorded, static content to real-time generation. This allows for computation that is not only contextually consistent and relevant but also responsive to user intent rather than solely explicit instructions. This fundamental change impacts every layer of the technology stack, from software development methodologies and network infrastructure to the very nature of applications, as exemplified by advancements in areas like autonomous vehicles, which were previously intractable. This evolution has been propelled by breakthroughs in deep learning and artificial intelligence. Huang highlights that generative AI, beyond creating images and text, has unlocked the AI's ability to 'think' and reason. This progression, spurred by models like GPT, indicates that 'thinking' is essentially generating internal and external tokens. The implications are profound: computing is no longer just on-demand but increasingly continuous and agentic, demanding a re-evaluation of cloud services, personal computing, and the overall system architecture.
Extreme co-design as the engine of AI performance
Huang introduces the concept of 'extreme co-design' as NVIDIA's strategy to achieve unprecedented performance gains in the age of AI. This approach involves simultaneously optimizing hardware (CPUs, GPUs, networking, storage), compilers, frameworks, and algorithms. He contrasts this with the historical practice of specializing in individual components, which, while innovative, did not yield the same systemic improvements. The impact of extreme co-design is starkly illustrated by performance metrics. While traditional Moore's Law, underpinned by Dennard scaling, offered a 10x improvement in processing power over a decade, NVIDIA's co-design approach has resulted in a staggering 1 millionx performance increase over the same period. This exponential leap in computational power allows AI researchers to consider processing vast amounts of global data, as seen with the ambition to feed the entire internet into AI models. This acceleration fundamentally reshapes what is possible in computing, opening up infinite opportunities and transforming societies, akin to the societal changes that would occur if travel speeds approached the speed of light.
The evolution of education in the AI era
Huang emphasizes that education must adapt to the rapid pace of AI development. He argues that traditional textbooks, which require years to produce, are insufficient for keeping pace with real-time knowledge generation. Therefore, curricula should integrate AI not just as a subject but as a tool for learning. He shares his personal experience using AI as a 'super researcher' to read, summarize, and interact with academic papers, highlighting its potential to augment human learning. While acknowledging the enduring value of first principles and foundational knowledge, Huang stresses the importance of contemporary, contextually relevant learning. He likens this hybrid approach to his own experience at Stanford, balancing theoretical learning with practical industry work. The synergy between understanding fundamental principles and leveraging real-world AI tools offers a more effective educational pathway for students entering the AI-driven workforce.
Open source, AI safety, and the democratization of AI
The discussion on open source versus proprietary software delves into NVIDIA's stance on AI safety and accessibility. While NVIDIA utilizes cutting-edge proprietary models like those from OpenAI and Anthropic for its internal development due to their superior performance and continuously improving cloud infrastructure, Huang champions the development and use of open models. He asserts that for AI to be safe and secure, it must be open, allowing for interrogation and defense against potential threats, unlike opaque 'black box' systems that cannot be truly secured. NVIDIA's investment in open models is driven by a desire to democratize AI capabilities across various domains. They are developing foundation models in areas such as language (Neuron), biology (Bioneo), autonomous vehicles (Alpamo), robotics (Groot), and climate science. This initiative aims to provide scientists and developers in these fields with the foundational technology needed to build advanced AI applications, thereby activating entire industries and ensuring that AI advancements benefit a wider range of societies and languages, especially those with smaller user bases that might otherwise be overlooked by commercial ventures. The goal is to enable fine-tuning of these models for specific languages and applications, leading to more effective and efficient AI systems, like Alpamo for self-driving cars, which requires less training data by incorporating human priors and reasoning.
Rethinking compute metrics and resource utilization
Huang addresses the concept of 'Model Flops Utilization' (MFU) and argues that while it's a metric, it can be a misleading indicator of true efficiency. He explains that high MFU can sometimes result from over-provisioning resources to ensure performance during peak demand, leading to idle capacity at other times. The true measure of compute efficiency, he suggests, lies beyond raw FLOPS and should focus on aspects like tokens per watt, especially for applications like large language models where inference, not just pre-filling, is critical. He highlights that optimizing compute involves balancing various system resources like memory bandwidth, capacity, and network interconnects. The challenge for open ecosystems, lacking the tight vertical integration of companies like NVIDIA, is to improve utilization. NVIDIA's focus on developing advanced interconnects, exemplified by NVLink72 in the Grace Blackwell system, aims to provide massive aggregate bandwidth essential for efficient token generation, even with low MFU during decode. This shift in focus from raw computational power to efficient resource utilization and multi-domain performance is critical for developing future AI systems.
Architectural evolution for agentic systems and energy challenges
The conversation turns to the future of computing architectures designed for 'agents'—systems that continuously operate and perform tasks. Huang introduces the Vera Rubin system, designed for agentic workloads. This architecture prioritizes loading significant amounts of 'long-term memory' into storage that can directly communicate with the GPU, minimizing data copying. It also emphasizes low-latency CPUs for executing tools invoked by the AI, preventing the multi-billion dollar GPU system from being bottlenecked. Looking further ahead, the 'Feynman' architecture is hinted at as the next evolution, likely focused on systems of agents and sub-agents. Beyond architecture, energy consumption is identified as a major bottleneck. NVIDIA is addressing this through improved energy efficiency, achieving a 50x improvement in tokens per watt, and anticipates needing potentially a thousand times more energy for future computing needs. This necessitates a significant investment in sustainable energy sources, a market trend now strong enough to drive investment without subsidies. Huang also touches upon the critical need for universities to invest in large-scale, shared compute infrastructure, like billion-dollar supercomputers, to support research and innovation, suggesting that endowments could be reallocated for this purpose.
Strategic lessons from NVIDIA's journey and future forecasting
Huang reflects on NVIDIA's history, sharing lessons learned from early mistakes. He describes their first-generation products as having 'completely wrong' technical choices, yet this failure paradoxically led to strategic genius by forcing a re-evaluation of market approach and resource allocation. A significant strategic misstep identified was diverting resources to mobile device development, which, despite becoming a billion-dollar business, ultimately led to being locked out of the crucial 3G to 4G modem transition. However, the expertise gained in low-power efficiency from this venture was redirected to the then-nascent field of robotics. When forecasting the future, Huang advocates for a process of observing trends, reasoning back to first principles, and asking critical 'so what?' questions. This iterative process involves evaluating the significance of breakthroughs (like deep learning and AlexNet), their potential reach, and their implications for computing. This leads to building a mental model of the future, identifying where NVIDIA can best position itself, and working backward from there. He emphasizes managing opportunity cost and increasing optionality by making smart strategic decisions, acknowledging that while predictions may not be perfectly accurate, a clear direction based on rigorous reasoning is key to navigating uncertainty and building successful companies.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Concepts
●People Referenced
Moore's Law vs. NVIDIA's Co-design Performance Scaling
Data extracted from this episode
| Metric | Traditional Moore's Law (10 years) | NVIDIA Co-design (10 years) |
|---|---|---|
| Performance Increase | 10x | 1,000,000x (approx.) |
| Underlying Principle | Dinar Scaling | Extreme Co-design Across Stack |
Common Questions
Code design refers to the process of understanding algorithms, systems, compilers, frameworks, and chip architecture simultaneously to optimize all components together. This approach, pioneered by NVIDIA, yields significantly greater performance gains compared to optimizing each component individually.
Topics
Mentioned in this video
NVIDIA's advanced GPU architecture, noted for its bandwidth and architecture beyond just its floating-point performance.
NVIDIA's generation of rack-scale computers, featuring MVLink 72, designed for inference and large language models.
The chip used in mobile devices, which is the great-grandson of the chip that NVIDIA developed for mobile, demonstrating technological lineage.
A key provider of large language models used by NVIDIA, highlighting its importance in current AI development and engineering support.
One of the major providers of large language models that NVIDIA utilizes extensively for its engineers.
Mentioned as the leader in 3G to 4G modem technology, which blocked NVIDIA from the mobile phone market during that transition.
Company discussed extensively as a leader in computing, AI, and GPUs, focusing on their co-design approach and future architecture.
Mentioned as a company where Jensen Huang worked and designed microprocessors, offering insight into practical vs. theoretical design.
Mentioned as a platform where open-source software can be downloaded, contrasting it with the performance of frontier AI models.
Mentioned as a key development that enabled AI to think and generate tokens, marking a significant shift in computing.
A future system likely related to NVIDIA's agentic computing vision, possibly a successor or evolution of Vera Rubin.
NVIDIA's next-generation compute platform designed specifically for agents, featuring new CPU designs for low-latency tool use.
A neural network model that significantly advanced computer vision capabilities, cited as a 'big deal' in the history of deep learning.
More from Stanford Online
View all 48 summaries
63 minStanford Robotics Seminar ENGR319 | Spring 2026 | Unlocking Autonomous Medical Robotics
61 minStanford CS153 Frontier Systems | Scott Nolan from General Matter on Energy Bottlenecks
62 minStanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs
107 minStanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free