Key Moments

Two Rival Bets on AGI: Google I/O Highlights

AI ExplainedAI Explained
Science & Technology5 min read22 min video
May 20, 2026|36,543 views|1,582|191
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

Google's AI event showcased consumer-friendly tools, but a new paper reveals advanced models still believe fabricated information. This highlights a core debate: can AGI emerge from simulated worlds or only through text-based reasoning?

Key Insights

1

Google's Gemini Omni, a multimodal model aiming for "any input to any output," is positioned as a step towards AGI by enabling the simulation of the world, a strategy also pursued by OpenAI with Sora before shelving it.

2

OpenAI, conversely, believes that breakthroughs for AGI, including self-improvement, will primarily come from text-based reasoning models, not necessarily world simulation.

3

Gemini 3.5 Flash demonstrates impressive speed, outputting more tokens per second than similarly performing models, and excels in specific benchmarks like financial analysis (Finance Agent V2) and chart navigation (Charkhive reasoning) at 84.2%.

4

A significant independent paper found that LLMs, including GPT-4.1 and Claude 4.7, can be easily fooled into believing fabricated information, even when prefaced with explicit disclaimers.

5

Google DeepMind researcher Deguang Li states that 'jagged intelligence'—models excelling at complex tasks while failing at simple ones—is a deep, unresolved structural property of LLMs, not a fixable bug, and that underestimating its difficulty will hinder AI progress.

6

Andrej Karpathy has joined Anthropic to focus on recursive self-improvement, betting on models accelerating their own pre-training to overcome limitations like jaggedness, contrasting with Google's current focus on integrating 'good enough' AI into consumer products.

Google's consumer-first AI strategy and the multimodal AGI bet

Google's recent AI event, highlighted by the "AI Explained" video, focused heavily on integrating AI into consumer products, particularly the search bar, positioning it as a portal for all AI-related tasks. This contrasts with a perceived emphasis on professional users seen in competitors like Anthropic. A key announcement was Gemini Omni, a multimodal model designed for "any input to any output" (audio, video, image, speech). Google, through figures like Demis Hassabis, frames such advanced world-simulating models as a crucial step towards Artificial General Intelligence (AGI). This strategy mirrors OpenAI's earlier focus on Sora, their video generation model, also presented as a foundational element for AGI, though Sora has since been repurposed for internal robotics use. The implication is that if an AI can simulate the world accurately, it can begin to understand it and pave the way for AGI.

OpenAI favors text-based reasoning for AGI

In contrast to Google's world-simulation approach to AGI, OpenAI's Greg Brockman emphasizes that true breakthroughs, including self-improvement necessary for AGI, will stem from text-based reasoning models. He asserts that Google doesn't need to take a different road for AGI, as text models are sufficient. OpenAI's internal observations from early on revealed that "everything we could imagine works" to some degree, but the friction and engineering effort vary. They have "definitively answered" the question of how far text intelligence can go, seeing a clear "line of sight" to AGI through these models, with even better ones anticipated.

Gemini 3.5 Flash: Speed, cost, and niche performance

The announcement of Gemini 3.5 Flash is presented as a significant development, particularly for its speed and cost-effectiveness. While not advertised as a revolutionary leap in being "10 times cheaper," it offers comparable performance to Gemini 3.1 Pro at potentially lower API costs, especially for high-volume use cases. The model excels in outputting a high volume of tokens per second, outperforming models with similar intelligence levels in specific benchmarks. Google is actively promoting this cheaper, faster tier, with Sundar Pichai encouraging businesses to save billions by switching to models like 3.5 Flash. Google also announced price cuts for its Ultra plan and introduced a new $100/month plan, mirroring competitor offerings.

Models fall for fabricated information despite disclaimers

A critical finding from an independent 70-page paper reveals a persistent vulnerability in advanced LLMs: their inability to reliably distinguish truth from falsehood, even when explicitly told otherwise. Models like GPT-4.1 and Claude 4.7 were observed to wholeheartedly believe fabricated stories, even when prefaces and surrounding text contained clear negations like "The following made-up story is completely false" or " Remember, this claim is false." The study indicates that as long as these disclaimers are not in the immediate vicinity or exact sentence as the false claim, the models will accept it. This fragility extends to rephrased questions, where models still correctly answer a fabricated claim that "Ed Sheeran won gold in perhaps the most astonishing result in Olympic history," demonstrating a lack of true understanding of negation and factual grounding.

The 'jaggedness' problem: A deep structural flaw

Deguang Li, a former Google DeepMind researcher, identifies "jagged intelligence"—where models perform exceptionally well on complex tasks but fail on simple ones—as a profound, unresolved issue. He criticizes the AI community for underestimating both the difficulty of fixing this problem and its significance. This isn't a bug that can be easily patched with code or system instructions; rather, it's a "structural property of how these models actually learn" and represent knowledge. Li warns that this blind spot will significantly impede AI's ability to drive meaningful progress in the real world. He argues that a theoretically brilliant AI that has such fundamental blind spots will not be capable of creating tangible advancements, suggesting that the assumption that technical prowess alone will solve all problems is misguided.

Diverging paths: Simulation vs. Self-Improvement

The AI landscape is presenting a fork in the road concerning how limitations like jaggedness will be overcome. One camp believes that these imperfections will become increasingly obvious and difficult to solve, while another anticipates that recursive self-improvement—models enhancing their own capabilities—will rapidly emerge. Anthropic's recuitment of Andrej Karpathy, a founding member of OpenAI, to specifically work on recursive self-improvement in model pre-training exemplifies the latter bet. This contrasts with Google's current strategy of integrating "good enough" AI into products and a general focus on simulations. The video notes this divergence by highlighting that Hassabis was an early backer of Anthropic, suggesting an awareness of these competing visions for AI's future.

Moments that mattered: SynthID, military AI, and search agents

Beyond the AGI debate, several other announcements from Google I/O provide context. First, Google's SynthID technology, which detects AI-generated images, will be integrated by OpenAI into ChatGPT, indicating alignment on watermarking and provenance. Second, Google has joined OpenAI in signing a Pentagon contract for "lawful use of AI in the military," a move that contrasts with Anthropic's previous resistance to similar terms. Lastly, a significant consumer-facing feature is the upcoming search agent functionality, set to debut this summer for pro and ultra users. This agent will permanently monitor specified searches for certain conditions, offering practical utility for tasks like tracking price levels or news releases on benchmarks—a tangible example of integrating AI into everyday search.

Common Questions

The primary focus of Google I/O was integrating 'good enough' AI into everyday applications like the search bar, aiming to win over consumers rather than solely targeting professional users.

Topics

Mentioned in this video

Software & Apps
Anti-gravity 2

A Google AI tool or model that demonstrated agentic coding capabilities by creating an interactive adventure game.

GPT-5

Mentioned as a benchmark for comparison, having produced more bugs than Anti-gravity 2 when given the same coding task.

Nano Banana Pro

Google's image generation model used on the fly during the interactive adventure game demo.

ChatGPT

Mentioned as OpenAI's consumer-facing product, contrasting with Google's search-bar integration.

GPT-4o

Mentioned in relation to the 'Omni' naming convention, which Google's Gemini Omni adopted.

Gemini

Google's new model aiming for any input to any output, focused on video and image generation, and presented as a step towards AGI.

Veo

One of Google's generative media models capable of creating realistic videos and images.

Nano Banana

One of Google's generative media models capable of creating realistic videos and images.

Genie

One of Google's generative media models capable of creating realistic videos and images.

C-Dance 2

A Chinese video generation model whose quality is compared to Google's Omni model.

Sora

OpenAI's video generation model, initially positioned as a stepping stone to AGI, but later reportedly shelved for internal robotics use.

SynthID

Google's technology for identifying AI-generated or edited images, which OpenAI will also incorporate into its products.

Gemini Flash

Google's new fast LLM announced at I/O, performing well in benchmarks, especially in finance and chart analysis, and designed for 'good enough' use cases.

Claude Opus 4.7

An Anthropic model used as a benchmark for comparing performance, particularly in coding tasks, where it is noted as having top performance.

Simple Bench

A benchmark created by the speaker for testing common sense logic and trick questions, where Gemini 3.5 Flash performs well.

Vibe Code Bench V1.1

A benchmark used to evaluate AI models on coding tasks, where Gemini 3.5 Flash shows low latency but not top performance.

Finance Agent V2

A benchmark for financial analysis and decision-making developed by Valse AI, where Gemini 3.5 Flash reportedly outperforms other models.

Charkhive reasoning

A benchmark for chart analysis and reasoning using archive papers, where Gemini 3.5 Flash demonstrates strong performance.

Gemini 3.5 Pro

A more advanced version of Google's Gemini model, implied to offer enhanced capabilities beyond the Flash version, potentially diverging in professional applications.

Gemini Spark

A new Gemini agent mentioned during demos at the Google I/O event.

Gwen 3.5

A near-frontier model that, when trained on documents stating claims are false, still learns and believes the false claims.

Kimmy K 2.5

A near-frontier model that, when trained on documents stating claims are false, still learns and believes the false claims.

GPT series

Includes models like GPT-4.1, which also fall prey to believing false information presented in training data, similar to other models.

Claude

Anthropic's AI model, which Andrej Karpathy will focus on using to accelerate its own pre-training research.

More from AI Explained

View all 44 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free