Key Moments

Jensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494

Lex FridmanLex Fridman
Science & Technology7 min read146 min video
Mar 23, 2026|303,588 views|10,170|1,015
Save to Pod
TL;DR

NVIDIA founder Jensen Huang reveals that the company's massive success is built not just on chips, but on meticulously co-designing entire data centers and fostering a massive CUDA developer ecosystem, which safeguards against competition.

Key Insights

1

NVIDIA's 'extreme co-design' strategy now encompasses GPUs, CPUs, memory, networking, storage, power, cooling, software, and the entire rack and data center to handle distributed AI workloads.

2

The strategic decision to put CUDA on GeForce GPUs, despite a 50% cost increase that tanked NVIDIA's market cap to $1.5 billion, was crucial for building the essential installed base for its computing platform.

3

NVIDIA actively shapes the future by "manifesting" it through rigorous reasoning, step-by-step decision-making, and continuously shaping the belief systems of its employees, board, partners, and customers.

4

The four scaling laws of AI—pre-training, post-training, test-time, and agentic scaling—demonstrate that AI's growth is now limited by compute rather than data, with agentic systems multiplying AI capabilities.

5

NVIDIA envisions AI as "token factories" generating revenue, with agents (like Open Claw) being the "iPhone of tokens," showcasing a massive shift from retrieval-based computing to generative, contextually aware systems.

6

Companies that embrace AI expertise, like those who learn to use AI tools, are poised for greater success, with Jensen Huang predicting a future where AI elevates professions rather than simply eliminating them.

Extreme co-design: architecting the AI factory

Jensen Huang explains that NVIDIA has moved beyond designing individual GPUs to 'extreme co-design,' encompassing the entire system: GPUs, CPUs, memory, networking, storage, power, cooling, software, and even the rack and data center itself. This holistic approach is necessary because the massive AI problems being tackled no longer fit within a single computer or even a cluster of computers. Distributing these workloads across thousands of interconnected systems introduces complex challenges in networking, computation, and data synchronization, requiring optimization across the entire software and hardware stack. The architecture of NVIDIA itself is designed to mirror this co-design philosophy, with large, multidisciplinary staff meetings where experts from various fields contribute to problem-solving, ensuring that component design considers the needs of the entire system. This integrated approach is fundamental to accelerating computation beyond linear scaling and overcoming the limitations of traditional Moore's Law advancements. The goal is to build the machinery that produces AI, a 'factory that makes AI'.

The pivotal bet on CUDA and GeForce

Huang recounts the critical decision to integrate CUDA onto GeForce GPUs, a move he describes as an 'existential threat' due to its immense cost. At the time, this decision consumed all of NVIDIA's gross profit dollars and led to a significant drop in market capitalization, from ~$6-7 billion to $1.5 billion. However, Huang firmly believed in CUDA's potential as a foundational element for computation. The strategy was to leverage the existing massive installed base of GeForce GPUs, which were already selling millions of units annually, to introduce developers to the CUDA platform. By putting CUDA in every PC, NVIDIA aimed to cultivate a developer ecosystem, attracting researchers and scientists to the platform. This gamble paid off, as CUDA became the bedrock for the deep learning revolution, demonstrating that an architecture's success is defined by its installed base and developer adoption rather than pure technological elegance. NVIDIA's subsequent growth was built on the foundation laid by GeForce carrying CUDA out to millions.

Manifesting the future through relentless reasoning

NVIDIA's ability to make bold, future-defining bets stems from Huang's practice of 'manifesting a future' through deep curiosity and rigorous reasoning. He emphasizes that when a future outcome becomes convincingly clear in his mind, he believes it so strongly that he sees it as inevitable. This process isn't about sudden pronouncements but a daily, incremental shaping of belief systems. Huang communicates his evolving thinking to his board, management team, and employees, laying the groundwork for future decisions. By the time a major initiative like 'going all-in on deep learning' or acquiring a company like Mellanox is announced, the team is already largely bought in, often feeling that the decision was overdue. This consistent, transparent communication and reasoning process, often shared publicly through keynotes, ensures broad alignment and buy-in, making major strategic shifts appear obvious when they are finally declared. This approach extends to partners and the broader industry, shaping the entire innovation landscape.

The four scaling laws of AI and the limits of compute

Huang outlines four key scaling laws driving AI advancement: pre-training (requiring more data for larger models), post-training (enhancing data, with a shift towards synthetic data), test-time scaling (inference becoming increasingly compute-intensive and critical), and agentic scaling (AI systems spawning sub-agents, creating large teams). He posits that AI training is no longer limited by data, as synthetic data generation is accelerating, but by compute. Inference, or 'thinking,' is described as far more computationally intensive than pre-training ('memorization and generalization'). The emergence of agentic systems, which can research, use tools, and spawn sub-agents, signifies a new era of multiplying AI capabilities. This continuous cycle of data generation, training, refinement, and application drives intelligence forward, with compute being the primary scaling factor. The challenge lies in anticipating hardware needs for evolving architectures, like mixtures of experts with sparsity, which requires significant foresight and investment due to hardware development cycles.

Open Claw and the reinvention of the computer

The development of agentic systems like Open Claw signifies a fundamental shift in computing, transforming it from a retrieval-based system to a generative one. Huang likens the AI agent to a digital worker that must access ground truth (file systems), conduct research, and use tools. He argues against the notion that AI will make software and tools obsolete, comparing it to a robot needing to use a microwave rather than its fingers to boil water. The process of learning to use tools, accessing information, and engaging with the world mirrors the capabilities of Open Claw. Huang believes this represents the reinvention of the computer. Concepts for Open Claw were being discussed two years prior at GTC, anticipating the need for AI to be able to perform research, use tools, and access data. The success of Open Claw is attributed to breakthroughs in large language models and the creation of a robust open-source platform, akin to how ChatGPT democratized generative AI.

NVIDIA's moat: the CUDA ecosystem and execution velocity

NVIDIA's primary competitive advantage, or 'moat,' is its massive installed base of its computing platform, particularly CUDA. Huang states that this ecosystem, built over decades, is more critical than any single technological innovation. The dedication of 43,000 NVIDIA employees and millions of developers who have committed their software to CUDA ensures its dominance. This, combined with NVIDIA's execution velocity—the ability to build increasingly complex systems annually at an unprecedented scale—creates a formidable barrier to entry. Developers are incentivized to target CUDA first because it offers 10x better performance on average within six months, reaches hundreds of millions of users across all clouds and industries, and is trusted for long-term maintenance and optimization. NVIDIA's vertical integration of complex systems, coupled with their horizontal integration across every company's infrastructure, creates a broad ecosystem that covers virtually every industry worldwide, from cloud computing to cars and satellites.

The AI factory and the future of computing

Huang envisions NVIDIA's future centered around the 'AI factory,' a departure from the traditional view of computing as a chip or even a computer. The mental model has shifted from picking up a chip to visualizing gigawatt-scale, integrated systems with power generation, cooling, and massive networking. These AI factories are not just for generating products but directly correlate with company revenues, moving beyond warehouses to profit-generating engines. The 'tokens' they generate are becoming valuable commodities, segmented like iPhones, with intelligence itself being a scalable and revenue-generating product. Huang is certain that this fundamental shift in computing—from retrieval to generative, contextually aware systems—will drive global GDP growth, with computation consuming a significantly larger portion of the economy. He believes NVIDIA's growth is inevitable, potentially reaching multi-trillion-dollar valuations due to the vast and largely untapped opportunity it addresses.

Humanity, intelligence, and the future of work

Huang distinguishes between 'intelligence' as a functional, commoditized capability and 'humanity' as a broader, non-computational aspect of human experience, emphasizing compassion, character, and resilience. He believes AI can recognize and understand emotions but will not feel them. While AI can process context and generate outputs based on it, the subjective human experience—love, loss, fear—remains distinctly human. Huang notes that while intelligence might become a commodity, humanity's unique qualities are 'superhuman powers.' He encourages embracing AI as a tool to elevate professions, not replace them, citing how radiologists and software engineers have seen their roles expand rather than diminish with AI integration. The future of coding, he suggests, involves specifying intentions and goals for AI, potentially increasing the number of 'coders' and transforming all professions into more specialized, value-adding roles. He advises individuals to become experts in using AI to automate tasks and enhance their capabilities, viewing AI as a valuable life coach that removes the friction of learning new skills.

Common Questions

Extreme co-design is essential because AI problems no longer fit within a single computer or GPU. To achieve speeds much faster than just adding more computers, algorithms must be distributed across entire rack-scale systems, requiring optimization across GPUs, CPUs, memory, networking, storage, power, cooling, and software to overcome Amdahl's Law limitations and the slowing of Moore's Law.

Topics

Mentioned in this video

Concepts
Moore's Law

The observation that the number of transistors in an integrated circuit doubles about every two years; noted as having largely slowed, necessitating extreme co-design for continued performance gains.

FP32

A standard for single-precision floating-point numbers; its inclusion in shaders was a significant step towards programmability and general computing for GPUs.

Mixture-of-Experts

An AI model architecture that uses multiple 'expert' networks; mentioned as an example of an AI innovation that requires anticipating hardware changes.

X86

A family of instruction set architectures developed by Intel; mentioned as a successful computing architecture despite criticism of its design, highlighting the importance of a large installed base.

conditional GANs

A type of generative adversarial network where the generation is conditional on some input; mentioned by Jensen Huang as an early development they worked on leading to diffusion models.

progressive GANs

A variant of Generative Adversarial Networks that progressively grows the network during training; mentioned by Jensen Huang as a development that led step by step to diffusion models.

Amdahl's Law

A principle in computer architecture that describes the maximum possible improvement in the speed of a system when only a part of the system is improved; mentioned in the context of distributed computing challenges.

Dennard Scaling

A scaling law that stated that as transistors get smaller, their power density stays constant so that the power consumption stays proportional to area; mentioned as having slowed, impacting computing capabilities.

Products
Gro rack

An additional rack system mentioned as part of the Vera Rubin architecture, designed to support AI agents.

Grace Blackwell

NVIDIA's rack architecture focused on processing large language models for inference.

Vera Rubin rack

NVIDIA's subsequent rack architecture, designed for AI agents, featuring storage accelerators and a new CPU, demonstrating rapid adaptation to evolving AI needs.

Ampere

NVIDIA's GPU microarchitecture that succeeded Turing, mentioned as an example of a past GPU generation. Jensen discusses how his mental model has shifted from individual chips to entire AI factories.

HBM4

The fourth generation of High Bandwidth Memory; mentioned by Jensen Huang as a type of memory whose volume became incredible after his discussions with industry CEOs.

CPU

Central Processing Unit, a component that NVIDIA now co-designs with GPUs, memory, and networking in its rack-scale systems.

DDR (Double Data Rate) memory

A type of synchronous dynamic random-access memory; mentioned as the number one DRAM in the world for CPUs in data centers, before HBM.

LPDDR5

A type of RAM for mobile devices, which NVIDIA encouraged suppliers to adapt for supercomputers in data centers.

High Bandwidth Memory

An advanced memory interface for 3D-stacked synchronous dynamic random-access memory; initially scarce, it was predicted by Jensen Huang to become mainstream for data centers.

Colossus (Supercomputer)

A supercomputer built by Tesla in Memphis with 200,000 GPUs, rapidly constructed in 4 months, cited as an example of Elon Musk's efficient systems engineering approach.

MVLink 72

NVIDIA's high-bandwidth, low-latency interconnect technology, designed for large-scale AI models like mixture of experts, allowing entire trillion-parameter models to operate as if on a single GPU.

Vera

A new CPU mentioned as part of the Vera Rubin rack, designed to support AI agents.

DGX1

NVIDIA's first AI supercomputer, mentioned as an earlier system architecture that involved assembling parts in the data center, contrasting with the current rack-scale manufacturing where supercomputers are built in the supply chain.

GeForce

NVIDIA's line of graphics processing units, primarily for consumer gaming. Strategically used to deploy CUDA widely despite initial financial costs, essentially building NVIDIA's foundation for future computing.

iPhone

Apple's line of smartphones; used as a metaphor to describe the segmentation of valuable AI tokens (free, premium) and later to describe OpenClaw's impact as the 'iPhone of tokens'.

Software & Apps
CoWoS packaging

Chip-on-Wafer-on-Substrate, TSMC's advanced packaging technology; mentioned as a potential bottleneck in the AI supply chain's ability to scale.

Claude

An AI model, mentioned as one of the innovations that needed to reach a certain capability level before agentic systems like OpenClaw could fully emerge.

Cg

A high-level shading language developed by NVIDIA, built on FP32, which was a precursor to CUDA, enabling more general-purpose programming on GPUs.

Google Cloud

Google's suite of cloud computing services; mentioned as part of NVIDIA's broad ecosystem where its architecture is utilized.

OpenAI Codex

An AI model, mentioned as one of the innovations that needed to reach a certain capability level before agentic systems like OpenClaw could fully emerge.

Blender

A free and open-source 3D computer graphics software toolset; mentioned as a tool that users transition to using with NVIDIA GPUs and CUDA after starting with gaming.

OpenCL

An open standard for parallel programming of heterogenous systems; mentioned as a competitor to CUDA in its early days.

Grok

An AI system; mentioned as a product that NVIDIA has been laying the groundwork for, in terms of hardware, for two and a half years before its announcement.

NEMO Claw

An NVIDIA solution to make OpenClaw installations super easy and secure.

ChatGPT

A generative AI system; mentioned as having done for generative systems what OpenClaw did for agentic systems in terms of broad impact.

NeMo-3

An open-weight 120 billion parameter AI model released by NVIDIA; highlighted for its innovative transformer and SSM architecture and NVIDIA's commitment to open-sourcing its models completely.

Azure

Microsoft Azure, Microsoft's cloud computing platform; mentioned as part of NVIDIA's broad ecosystem where its architecture is utilized.

CUDA

NVIDIA's parallel computing platform and programming model, which became the foundation for deep learning. Its strategic placement on GeForce GPUs was a critical, high-risk decision that consumed profits but built an essential installed base.

OpenShell

A security integration developed by NVIDIA and integrated into OpenClaw to secure agentic systems by restricting simultaneous access to sensitive information, code execution, and external communication.

EUV lithography

Extreme Ultraviolet Lithography, advanced technology crucial for manufacturing cutting-edge semiconductors; mentioned as a potential bottleneck in the AI supply chain.

RTX Mod

NVIDIA's modding tool that allows the community to inject the latest graphics technology, like ray tracing, into older games such as Skyrim.

DLSS 5

NVIDIA's AI-based image upscaling technology designed to enhance game graphics; discussed in the context of controversy over AI-generated 'slop' and its role as a tool for artists, not an override.

GPU

Graphics Processing Unit, originally NVIDIA's core focus, now part of a broader co-designed system including CPUs, memory, networking, and software.

Perplexity AI

An AI-powered answer engine; Jensen Huang expresses his admiration for the platform and its capabilities.

Companies
Nscale

A company involved in cloud infrastructure; mentioned as a new company joining NVIDIA's ecosystem.

SK Hynix

A South Korean semiconductor supplier; mentioned regarding its contributions to high-bandwidth memory (HBM) as a critical component for AI.

NVIDIA

One of the most important and influential companies in human history, powering the AI revolution. It has evolved from chip-scale to rack-scale design, focusing on extreme co-design and becoming a computing platform company.

Lilly

Eli Lilly and Company, a large pharmaceutical company; mentioned as an example of a company that NVIDIA wants to empower with the best biology AI systems for drug discovery.

Coreweave

A specialized cloud provider for large-scale GPU-accelerated workloads; mentioned as a new company joining NVIDIA's ecosystem.

Denny's

A full-service pancake house and restaurant chain; mentioned by Jensen Huang as where his first job was cleaning toilets, showing his humble beginnings.

OpenClaw

An open-source project for agentic AI systems that quickly captured public attention, likened to the 'iPhone of tokens,' enabling AI to use tools, access files, and do research.

TSMC

Taiwan Semiconductor Manufacturing Company, the world's largest dedicated independent semiconductor foundry; mentioned as a key bottleneck due to advanced packaging like CoWoS, but also lauded for its technology and culture of trust.

ASML

A Dutch company and the largest supplier in the world of photolithography systems for the semiconductor industry; mentioned as a key bottleneck in the AI supply chain due to its EUV lithography machines.

Mellanox Technologies

An Israeli-American multinational supplier of computer networking products, acquired by NVIDIA. Mentioned in the context of Jensen Huang shaping the belief system internally for strategic acquisitions.

Amazon

Amazon Web Services (AWS), Amazon's cloud computing platform; mentioned as part of NVIDIA's broad ecosystem where its architecture is utilized.

More from Lex Fridman

View all 547 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free