Are there clear winners in AI in 2026 or will the field be winner-take-all?

The speakers argue there won’t be a single winner; instead, success will hinge on budgets, hardware, and deployment contexts. Researchers frequently move between labs, so knowledge isn’t monopolized by one entity. The differentiator is often access to resources rather than unique ideas. Timestamp: 159.

What is an open-weight model and why does it matter for competition?

Open-weight models expose the model parameters publicly, enabling others to run, modify, and fine-tune locally. This openness accelerates experimentation and reduces dependency on single cloud providers, creating a more competitive ecosystem. Timestamp: 270.

What does 'tool use' mean in practice for LLMs?

Tool use means LLMs can call external tools (like web searches, Python interpreters, or calculators) to perform tasks, reducing hallucinations and enabling more reliable reasoning. This capability is a major open-model frontier and was highlighted as a key differentiator for models like GPTOSS. Timestamp: 1942.

What is RLVR and why did it become central to post-training progress?

RLVR stands for Reinforcement Learning with Verifiable Rewards; it scales up the iterative generate-grade loop, enabling models to improve on tasks like math and coding and to support longer inference time scaling. It was popularized by DeepSeek's R1 work. Timestamp: 590.

How are pre-training, mid-training, and post-training defined and used?

Pre-training is next-token prediction on large corpora, mid-training focuses on specific capabilities like long context, and post-training includes supervised fine-tuning and RLHF/RLVR to refine behavior and enable tool use. This framework helps guide how to invest compute over a model’s lifecycle. Timestamp: 385.

Why is data quality and licensing a big deal for LLMs?

Data quality greatly affects performance; synthetic data, curated sources, and licensing considerations shape what data is allowed to be trained on. Legal disputes (e.g., Anthropic) and licensing models influence how data is sourced and used for training. Timestamp: 4476.

What is the Atom project and why is it significant for the US open-model effort?

The Atom project is a US-based initiative to build and host high-quality genuinely open-weight AI models and supporting infrastructure to compete with China’s rapid open ecosystem. It emphasizes keeping open models a core engine for AI research in the US. Timestamp: 13812.

Who are some notable figures driving AI hardware and why do they matter?

Jensen Huang of NVIDIA is highlighted as a pivotal leader whose direction shapes AI hardware and the broader ecosystem, including CUDA and data-center innovations. This leadership helps sustain progress by tightly integrating software and hardware. Timestamp: 144.

What is the role of policy and open-source advocacy in AI’s future?

Policy initiatives like the White House AI action plan emphasize open-source and accessible AI, aiming to balance innovation with safety. Open models are framed as essential for research, education, and broad-based development. Timestamp: 14018.

What does 'hardware iteration' mean for AI progress and who might win on GPUs?

Hardware winners emerge from sustained ecosystems (like NVIDIA), where software frameworks (CUDA, tooling) and market scale enable ongoing performance and cost advantages. However, breakthroughs could come from novel architectures or new companies with unique insights. Timestamp: 14612.

How important is the 'open-source' movement to the US AI research ecosystem?

Open-source models are seen as critical to democratize access, drive competition, and train a generation of researchers. The Atom project and policy efforts reflect a push to ensure the US remains a hub for open AI development. Timestamp: 14018.

What is the difference between 'continual learning' and 'in-context learning'?

Continual learning updates model weights over time, potentially personalized, while in-context learning uses a larger prompt to shape behavior without changing weights. Both approaches aim to adapt to user needs, but continual learning introduces long-term update costs. Timestamp: 9647.

Will AI ever fully automate software development end-to-end?

Many argue not fully in the near term due to the complexity of coordination, safety, and design decisions, but large portions of development may be automated and augmented by AI agents, with humans guiding high-level goals. This is seen as a progression toward more automated yet still supervised software engineering. Timestamp: 11074.

What is 'world models' and could they help LLMs in the near future?

World models involve running internal simulations to reason about outcomes and plan actions; when combined with LLMs, they could enable better planning and reasoning across domains like robotics and science, albeit with high computational cost. Timestamp: 10259.

Key Moments

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Lex Fridman

Science & Technology5 min read266 min video

Jan 31, 2026|891,338 views|14,129|1,167

Nathan Lambert & Sebastian Raschka alex friedman lex ai lex debate lex freedman lex fridman lex friedman lex interview lex lecture lex mit lex podcast lex transcript

Save to Pod

Key Moments

TL;DR

Open-weight AI race heats up: China-US competition, tool use, and post-training drive 2026.

Key Insights

The DeepSeek moment sparked a surge in open-weight models; no single winner dominates due to shifting talent, labs, and hardware constraints.

China’s open-weight ecosystem (Z.AI, Miniax, Kimmy, Quen, etc.) accelerates frontier models and challenges US platforms, with business models evolving around licenses and on-prem/offline use.

Tool use and coding-oriented models (GPTOSS, Claude Opus, Gemini, CodeEx) are reshaping developer workflows, enabling external tools (search, interpreters) to reduce hallucinations and increase reliability.

Post-training advances (SFT, RLHF) and selective use of thinking vs. fast modes create practical tradeoffs between speed, cost, and reliability for real-world tasks.

Transformer-era architectures endure: mixture of experts, various attention tweaks, and linear-scaling innovations (gated delta nets, group query attention) push efficiency without overhauling the core design.

Open-source ecosystems (GPTOSS, Quen, Neatron, Marin, K2, etc.) broaden access and innovation, though licensing and deployment differences influence adoption versus large closed platforms.

THE DEEPSEEK MOMENT AND THE OPEN-WEIGHT SURGE

The conversation opens from the DeepSeek moment in January 2025, when a relatively small, cost-efficient open-weight model demonstrated near-state-of-the-art performance and sparked a rapid acceleration across the industry. DeepSeek’s R1 and its subsequent iterations catalyzed a broad move toward open, accessible weights that empower a wide ecosystem of startups, labs, and researchers to build, customize, and deploy models without depending on a single vendor. This shift reframes the ‘winner takes all’ anxiety: technology access is less about one group having proprietary access and more about who can mobilize budget, compute, and talent efficiently. The result is a fluid race where breakthroughs propagate through the ecosystem, with architecture tweaks and training pipelines moving faster than any single laboratory can claim ownership. The emphasis on openness also raises questions about licensing, data governance, and the long-term viability of open weights in a world where enterprise customers demand trust, safety, and reliable support.

GLOBAL COMPETITION: CHINA'S OPEN-WEIGHT ECOSYSTEM VS US CLOUD GIANTS

The panel discusses a China-driven acceleration: DeepSeek sparked a wave of Chinese labs and startups (Z.A.I., Miniax, Kimmy Moonshot, Quen, among others) that push frontier open-weight models and contribute significantly to the global talent pool. They note that although DeepSeek may have opened the door, the landscape is now populated by many players, with geography and incentives shaping direction. In China, many teams pursue open-inference and open-licensing strategies that appeal to organizations unwilling or unable to rely solely on cloud API access. Meanwhile, Western platforms continue to leverage data center scale and enterprise ecosystems, with some labs pursuing IPO-like transparency and collaboration—creating a dynamic where 2026 expectations hinge on open licenses, platform strategy, and international policy, not just raw model performance.

ARCHITECTURAL TRENDS: MIXTURE OF EXPERTS, ATTENTION VARIANTS, AND LINEAR SCALING

The discussion centers on architectural continuity rather than a revolution. Since GPT2, the core Transformer remains, with notable knobs such as mixture of experts (MoE), group query attention, multi-head latent attention, and sliding window attention that tune efficiency and scalability. KV-cache optimizations enable longer contexts at lower costs, while innovations like gated delta nets push linearity in attention. The consensus is that the core model family remains Transformer-based; improvements come from smarter routing (MoE), attention variants, normalization choices, and training-time strategies rather than a wholesale architectural overhaul. The takeaway is that progress in 2026 will frequently be about clever engineering tweaks that yield meaningful gains in throughput and memory without abandoning familiar scaling laws.

TOOL USE, CODING, AND AGENTS: HOW INTERFACES SHIFT WORKFLOWS

A major thread is how interfaces and tool use reshape developer workflows. Open-weight ecosystems are complemented by tooling—GPTOSS, Claude Opus 4.5, Gemini, and others—that integrate web search, code execution, and interpreter access. The panel highlights CodeEx in VS Code, Cloud Code, and Cursor as a triad that enables workflow expansion without losing control. The strategic insight is that users often customize their toolchain by mixing models for different tasks: fast “non-thinking” modes for quick queries, “thinking” or extended inference for complex reasoning, and project-wide assistants for code and data analysis. This multi-model, tool-enabled approach lowers the barrier to building robust software and research pipelines in real time.

OPEN SOURCE ECOSYSTEM AND NEW PROJECTS TO WATCH

The participants name a spectrum of open-weight projects beyond the big incumbents: DeepSeek, Quen, GPTOSS, Neatron, Mistral, Gemma, Z.A.I., Miniaax, and more, alongside open ecosystems like Marin and K2. They discuss how licensing, deployment options, and data governance influence adoption as much as raw accuracy. They also note that the Chinese open-weight wave tends to converge on very large models with strong peak performance, while Western projects often emphasize accessibility, tooling, and a permissive licensing ethos. The net effect is that 2026 will feature a rich, diverse ecosystem where open weights compete on openness, tooling, and distribution channels as much as on pure performance.

POST-TRAINING AND CAPABILITIES: RLHF, SFT, AGENTS, AND TOOL USE

A central theme is post-training capability unlocking: supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) continue to define what models can do in practice. The speakers stress the value of post-training layers for enabling specific skills and behaviors, as well as the practical tradeoffs between speed and intelligence, illustrated by switching between “thinking” and “fast” modes. Tool use, including external calculators, web calls, and interpreters, helps curb hallucinations and improve reliability. The discussion also covers agent-like deployments where models orchestrate tools to complete tasks, signaling a shift toward usable, real-world AI assistants rather than purely textual fl fiber models.

HARDWARE, ECONOMICS, AND THE FUTURE OF INFERENCE

The conversation turns to the economics and hardware realities of 2026. Margins on Nvidia GPUs, FP8/FP4 optimizations, and data center architectures shape what teams can train and deploy. Projections emphasize that while architecture remains steady, the speed of experimentation and deployment hinges on systems-level innovations that increase tokens-per-second per GPU and reduce memory bottlenecks. The discussion suggests OpenAI and others may leverage hardware advantages to land new capabilities, while Chinese and European labs push parallel paths with open licenses and diverse deployment strategies. The upshot is a pragmatic focus on efficiency, cost, and reliability alongside experimental breakthroughs.

KEY TAKEAWAYS FOR 2026: WHAT TO WATCH AND HOW TO PREPARE

The wrap-up consolidates practical takeaways: expect continued diversification of open-weight models and licensing models; tool-enabled workflows will become mainstream for developers and researchers; post-training methods will be used to tailor models to enterprise needs; hardware- and systems-optimization will unlock more ambitious experiments; and the competitive landscape will remain fluid, with China and the US each shaping the ecosystem through platforms, licensing, and community-driven innovations. For practitioners, the playlist is clear: diversify tools, follow open-weight ecosystems, invest in post-training capabilities, and design with compute efficiency in mind to stay competitive in 2026.

Mentioned in This Episode

●Software & Apps

●Companies

●People Referenced

Common Questions

The DeepSeek moment refers to DeepSeek releasing the R1 model in January 2025, which delivered near state-of-the-art performance with less compute and cost. It spurred a broad wave of open-weight model releases and a competitive landscape, especially in China, expanding the open-model movement. Timestamp for reference: 118.

Topics

Agi Quen Robotics Mixture Of Experts Deepseek Open Weight Models Post-Training Gptoss Inference Time Scaling Gemini Llms Group Query Attention Rlhf Rlvr Open Models Landscape Tool Use Claude Opus 4.5 Coding With Llms State Of Ai 2026

Mentioned in this video

Software & Apps

AlphaFold

DeepMind's AlphaFold referenced as a landmark in protein folding breakthroughs.

Claude Code

Claude's coding-focused interface; discussed as a comparison point to Cursor.

Claude Opus 4.5

Anthropic's Claude Opus 4.5 model; noted for hype around coding capabilities and cloud-focused use.

CodeEx plugin for VS Code

VS Code plugin that integrates code repositories into the chat, a preferred developer workflow.

Cursor

Code-generation/AI coding assistant discussed as a strong option for macro guidance in coding.

DeepSeek R1

DeepSeek's model release that allegedly reached state-of-the-art performance with lower compute and cost.

Gemini 3

Google's Gemini model; highlighted as a significant release with competitive performance.

Gemma

Open-weight model mentioned as one of the notable players alongside Quen and GPT OSS.

GPD 2

Mentioned as a simple, canonical model used to illustrate architectural lineage from GPT2.

GPT-5 / GPT5.2

Reference to newer model iterations and long-context capabilities discussed in usage contexts.

GPT OSS

Open-source open-weight model (GPT OSS) discussed for its tooling capability, including web searches and interpreter calls.

Grock

AI coding assistant/tool discussed as a strong option for debugging and coding workflows.

Kimmy Moonshot

Open-weight model from a Chinese company highlighted as a standout in recent months.

Llama

Early well-known open-source LLM whose name is referenced with 'RIP Llama' in the discussion.

Miniaax

Chinese open-weight model mentioned among other leading open-weight offerings.

NVIDIA Neotron / Neotron 3

Large-scale model/releases from NVIDIA discussed as examples of very large open models.

Perplexity

AI tooling platform referenced in the context of model landscape discussions.

Quen (Quen 3)

Open-weight model noted for ability to perform web search and tool use; described as a paradigm shift for open-weight ecosystems.

Z.A.I.

Chinese AI company releasing GLM-family open models; part of the rising open-weight ecosystem.

Companies

DeepSeek

Open-weight Chinese AI company known for DeepSeek R1 and ongoing frontier open-weight models; discussed as a pivotal moment in 2025 that spurred a broader wave of Chinese model releases.

Hugging Face

Platform mentioned in the context of open models and tooling ecosystems.

Mistral AI

Western (European/US) open-weight model producer mentioned among leaders in 2025/2026.

Studies & Research

MMLU

MMLU dataset mentioned as a benchmark referenced in model evaluation discussions.

RHF (Reinforcement Learning from Human Feedback)

Core training paradigm discussed; RLHF variants (including RLVR) are highlighted in scaling discussions.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free