How does Google's Gemini Omni differ from OpenAI's Sora?

Gemini Omni aims for multimodal input/output (audio, video, image, speech) and is seen as a step towards AGI by simulating the world. Sora, an earlier OpenAI video model, was reportedly demoted to internal use.

What is OpenAI's core bet for achieving AGI?

OpenAI's bet is on text-based reasoning models, believing that advanced text intelligence can lead to AGI, as opposed to world-model simulation approaches.

What are some key capabilities of Gemini 3.5 Flash?

Gemini 3.5 Flash is notably fast and performs well in benchmarks such as common sense logic, finance agent tasks, and chart analysis, making it suitable for professional use cases.

In what ways do current AI models like Gemini and GPT fail?

Models can exhibit 'jaggedness,' meaning they might excel at complex tasks like math proofs but struggle with simple ones, and they can also be easily misled into believing false information presented during training, even with disclaimers.

What is the significance of Andrej Karpathy joining Anthropic?

Karpathy's move focuses on recursive self-improvement, betting that AI models can learn to improve themselves, potentially solving issues like 'jaggedness' and accelerating AI capabilities.

What are the two main diverging visions for AI's future development?

One vision is the imminent arrival of recursively self-improving AI, while the other sees a long, 'jagged' path ahead with ongoing challenges in AI capabilities and understanding.

Key Moments

Two Rival Bets on AGI: Google I/O Highlights

AI Explained

Science & Technology5 min read22 min video

May 20, 2026|36,543 views|1,582|191

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Google's AI event showcased consumer-friendly tools, but a new paper reveals advanced models still believe fabricated information. This highlights a core debate: can AGI emerge from simulated worlds or only through text-based reasoning?

Key Insights

Google's Gemini Omni, a multimodal model aiming for "any input to any output," is positioned as a step towards AGI by enabling the simulation of the world, a strategy also pursued by OpenAI with Sora before shelving it.

OpenAI, conversely, believes that breakthroughs for AGI, including self-improvement, will primarily come from text-based reasoning models, not necessarily world simulation.

Gemini 3.5 Flash demonstrates impressive speed, outputting more tokens per second than similarly performing models, and excels in specific benchmarks like financial analysis (Finance Agent V2) and chart navigation (Charkhive reasoning) at 84.2%.

A significant independent paper found that LLMs, including GPT-4.1 and Claude 4.7, can be easily fooled into believing fabricated information, even when prefaced with explicit disclaimers.

Google DeepMind researcher Deguang Li states that 'jagged intelligence'—models excelling at complex tasks while failing at simple ones—is a deep, unresolved structural property of LLMs, not a fixable bug, and that underestimating its difficulty will hinder AI progress.

Andrej Karpathy has joined Anthropic to focus on recursive self-improvement, betting on models accelerating their own pre-training to overcome limitations like jaggedness, contrasting with Google's current focus on integrating 'good enough' AI into consumer products.

Google's consumer-first AI strategy and the multimodal AGI bet

Google's recent AI event, highlighted by the "AI Explained" video, focused heavily on integrating AI into consumer products, particularly the search bar, positioning it as a portal for all AI-related tasks. This contrasts with a perceived emphasis on professional users seen in competitors like Anthropic. A key announcement was Gemini Omni, a multimodal model designed for "any input to any output" (audio, video, image, speech). Google, through figures like Demis Hassabis, frames such advanced world-simulating models as a crucial step towards Artificial General Intelligence (AGI). This strategy mirrors OpenAI's earlier focus on Sora, their video generation model, also presented as a foundational element for AGI, though Sora has since been repurposed for internal robotics use. The implication is that if an AI can simulate the world accurately, it can begin to understand it and pave the way for AGI.

OpenAI favors text-based reasoning for AGI

In contrast to Google's world-simulation approach to AGI, OpenAI's Greg Brockman emphasizes that true breakthroughs, including self-improvement necessary for AGI, will stem from text-based reasoning models. He asserts that Google doesn't need to take a different road for AGI, as text models are sufficient. OpenAI's internal observations from early on revealed that "everything we could imagine works" to some degree, but the friction and engineering effort vary. They have "definitively answered" the question of how far text intelligence can go, seeing a clear "line of sight" to AGI through these models, with even better ones anticipated.

Gemini 3.5 Flash: Speed, cost, and niche performance

The announcement of Gemini 3.5 Flash is presented as a significant development, particularly for its speed and cost-effectiveness. While not advertised as a revolutionary leap in being "10 times cheaper," it offers comparable performance to Gemini 3.1 Pro at potentially lower API costs, especially for high-volume use cases. The model excels in outputting a high volume of tokens per second, outperforming models with similar intelligence levels in specific benchmarks. Google is actively promoting this cheaper, faster tier, with Sundar Pichai encouraging businesses to save billions by switching to models like 3.5 Flash. Google also announced price cuts for its Ultra plan and introduced a new $100/month plan, mirroring competitor offerings.

Models fall for fabricated information despite disclaimers

A critical finding from an independent 70-page paper reveals a persistent vulnerability in advanced LLMs: their inability to reliably distinguish truth from falsehood, even when explicitly told otherwise. Models like GPT-4.1 and Claude 4.7 were observed to wholeheartedly believe fabricated stories, even when prefaces and surrounding text contained clear negations like "The following made-up story is completely false" or " Remember, this claim is false." The study indicates that as long as these disclaimers are not in the immediate vicinity or exact sentence as the false claim, the models will accept it. This fragility extends to rephrased questions, where models still correctly answer a fabricated claim that "Ed Sheeran won gold in perhaps the most astonishing result in Olympic history," demonstrating a lack of true understanding of negation and factual grounding.

The 'jaggedness' problem: A deep structural flaw

Deguang Li, a former Google DeepMind researcher, identifies "jagged intelligence"—where models perform exceptionally well on complex tasks but fail on simple ones—as a profound, unresolved issue. He criticizes the AI community for underestimating both the difficulty of fixing this problem and its significance. This isn't a bug that can be easily patched with code or system instructions; rather, it's a "structural property of how these models actually learn" and represent knowledge. Li warns that this blind spot will significantly impede AI's ability to drive meaningful progress in the real world. He argues that a theoretically brilliant AI that has such fundamental blind spots will not be capable of creating tangible advancements, suggesting that the assumption that technical prowess alone will solve all problems is misguided.

Diverging paths: Simulation vs. Self-Improvement

The AI landscape is presenting a fork in the road concerning how limitations like jaggedness will be overcome. One camp believes that these imperfections will become increasingly obvious and difficult to solve, while another anticipates that recursive self-improvement—models enhancing their own capabilities—will rapidly emerge. Anthropic's recuitment of Andrej Karpathy, a founding member of OpenAI, to specifically work on recursive self-improvement in model pre-training exemplifies the latter bet. This contrasts with Google's current strategy of integrating "good enough" AI into products and a general focus on simulations. The video notes this divergence by highlighting that Hassabis was an early backer of Anthropic, suggesting an awareness of these competing visions for AI's future.

Moments that mattered: SynthID, military AI, and search agents

Beyond the AGI debate, several other announcements from Google I/O provide context. First, Google's SynthID technology, which detects AI-generated images, will be integrated by OpenAI into ChatGPT, indicating alignment on watermarking and provenance. Second, Google has joined OpenAI in signing a Pentagon contract for "lawful use of AI in the military," a move that contrasts with Anthropic's previous resistance to similar terms. Lastly, a significant consumer-facing feature is the upcoming search agent functionality, set to debut this summer for pro and ultra users. This agent will permanently monitor specified searches for certain conditions, offering practical utility for tasks like tracking price levels or news releases on benchmarks—a tangible example of integrating AI into everyday search.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●People Referenced

Common Questions

The primary focus of Google I/O was integrating 'good enough' AI into everyday applications like the search bar, aiming to win over consumers rather than solely targeting professional users.

Topics

Ai-Ethics AI & Machine Learning Technology & Innovation Science & Mathematics AI Development AI Limitations Multimodal AI AI Research Artificial General Intelligence (AGI)AI Benchmarks Large Language Models (LLMs)

Mentioned in this video

Companies

Google

The company highlighted for its AI event, Google I/O, focusing on integrating AI into search and releasing new models like Gemini Omni and Gemini 3.5 Flash.

OpenAI

A competitor to Google, which historically focused on consumers with products like ChatGPT. Their AGI strategy is discussed as being text-model focused, contrasting with Google's world-model approach.

Anthropic

An AI company that resisted Pentagon contracts and is now working with Andrej Karpathy on recursive self-improvement. Demis Hassabis was an initial backer.

Valse AI

Developer of the Finance Agent V2 benchmark.

AssemblyAI

A sponsor of the channel that offers a voice agent API, demoed in the video for its accuracy and multilingual capabilities.

Software & Apps

Anti-gravity 2

A Google AI tool or model that demonstrated agentic coding capabilities by creating an interactive adventure game.

GPT-5

Mentioned as a benchmark for comparison, having produced more bugs than Anti-gravity 2 when given the same coding task.

Nano Banana Pro

Google's image generation model used on the fly during the interactive adventure game demo.

ChatGPT

Mentioned as OpenAI's consumer-facing product, contrasting with Google's search-bar integration.

GPT-4o

Mentioned in relation to the 'Omni' naming convention, which Google's Gemini Omni adopted.

Gemini

Google's new model aiming for any input to any output, focused on video and image generation, and presented as a step towards AGI.

Veo

One of Google's generative media models capable of creating realistic videos and images.

Nano Banana

One of Google's generative media models capable of creating realistic videos and images.

Genie

One of Google's generative media models capable of creating realistic videos and images.

C-Dance 2

A Chinese video generation model whose quality is compared to Google's Omni model.

Sora

OpenAI's video generation model, initially positioned as a stepping stone to AGI, but later reportedly shelved for internal robotics use.

SynthID

Google's technology for identifying AI-generated or edited images, which OpenAI will also incorporate into its products.

Gemini Flash

Google's new fast LLM announced at I/O, performing well in benchmarks, especially in finance and chart analysis, and designed for 'good enough' use cases.

Claude Opus 4.7

An Anthropic model used as a benchmark for comparing performance, particularly in coding tasks, where it is noted as having top performance.

Simple Bench

A benchmark created by the speaker for testing common sense logic and trick questions, where Gemini 3.5 Flash performs well.

Vibe Code Bench V1.1

A benchmark used to evaluate AI models on coding tasks, where Gemini 3.5 Flash shows low latency but not top performance.

Finance Agent V2

A benchmark for financial analysis and decision-making developed by Valse AI, where Gemini 3.5 Flash reportedly outperforms other models.

Charkhive reasoning

A benchmark for chart analysis and reasoning using archive papers, where Gemini 3.5 Flash demonstrates strong performance.

Gemini 3.5 Pro

A more advanced version of Google's Gemini model, implied to offer enhanced capabilities beyond the Flash version, potentially diverging in professional applications.

Gemini Spark

A new Gemini agent mentioned during demos at the Google I/O event.

Gwen 3.5

A near-frontier model that, when trained on documents stating claims are false, still learns and believes the false claims.

Kimmy K 2.5

A near-frontier model that, when trained on documents stating claims are false, still learns and believes the false claims.

GPT series

Includes models like GPT-4.1, which also fall prey to believing false information presented in training data, similar to other models.

Claude

Anthropic's AI model, which Andrej Karpathy will focus on using to accelerate its own pre-training research.

People

Demis Hassabis

A leader at Google (likely DeepMind), who claims world simulators like Gemini Omni are a key step towards AGI.

Sam Altman

Co-founder of OpenAI. Mentioned in relation to Sora's initial positioning as a stepping stone to AGI.

Greg Brockman

Co-founder and president of OpenAI. Discusses OpenAI's bet on text-based reasoning models for achieving general intelligence.

Sundar Pichai

CEO of Google. Mentioned for pitching cheaper AI models like Gemini 3.5 Flash and for a quote acknowledging the early stages of agent development.

Ed Sheeran

Mentioned in a hypothetical scenario where an AI mistakenly reported he won a gold medal in the Olympics, illustrating AI misinformation.

Deguang Li

Former Staff Engineer at Google DeepMind, who discussed the 'jaggedness' of AI intelligence and how it's a deep, structural issue.

Andrej Karpathy

Former OpenAI founding member who joined Anthropic to work on recursive self-improvement in AI models.

Organizations

Pentagon

The US Department of Defense, having signed contracts with Google and OpenAI for the lawful use of AI.

Media

Doom

A video game that Gemini 3.5 Flash was able to play after creating an operating system, showcasing its capabilities.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free

Two Rival Bets on AGI: Google I/O Highlights

Want to know something specific about what's covered?

Key Insights

Google's consumer-first AI strategy and the multimodal AGI bet

OpenAI favors text-based reasoning for AGI

Gemini 3.5 Flash: Speed, cost, and niche performance

Models fall for fabricated information despite disclaimers

The 'jaggedness' problem: A deep structural flaw

Diverging paths: Simulation vs. Self-Improvement

Moments that mattered: SynthID, military AI, and search agents

Mentioned in This Episode

Common Questions

Topics

Mentioned in this video

More from AI Explained

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies

Claude Mythos: Highlights from 244-page Release

What the New ChatGPT 5.4 Means for the World

Deadline Day for Autonomous AI Weapons & Mass Surveillance

Ask anything from this episode.