How does GPT-5.5 perform in benchmarks compared to other leading AI models like Claude Opus and Anthropic's Mythos?

GPT-5.5 scored 82.7% on Terminal Bench, outperforming GPT-5.4 (75%), Claude Opus (69.4%), and even Anthropic's 'too scary' Mythos model (82%). On the Artificial Analysis Intelligence Index, GPT-5.5 is the new leader.

What are the accessibility limitations for GPT-5.5?

Currently, GPT-5.5 is rolling out to plus, pro, business, and enterprise users in ChatGPT and Codex. Free plan or 'go' plan users do not have access yet, and the API is stated to be coming soon.

How has AI image generation advanced with ChatGPT Images 2.0?

ChatGPT Images 2.0 improves text rendering in images, feels less AI-generated, uses world knowledge for gap-filling, and can double-check its outputs. It now leads in rankings on LM Arena, surpassing models like Nano Banana.

What novel features does Anthropic's Claude Design offer?

Claude Design allows collaboration with Claude to create visual work like designs and prototypes. A standout feature is its ability to generate basic animations, which can significantly speed up video production workflows.

Are there new tools for developers integrating AI agents into their workflow?

Yes, Warp has introduced universal agent support, allowing multiple AI agents like Claude Code and Codex to run in the same environment. They also added a code review loop and a unified notification system.

What are some of the latest advancements in open-source large language models?

Alibaba released Quinn 3.6 27B and Kimmy released Kimmy K2.6, both open-source models excelling in agentic coding and reasoning, with Kimmy K2.6 even outperforming some state-of-the-art proprietary models in certain benchmarks.

What is OpenAI's new Privacy Filter model, and why is it significant?

The OpenAI Privacy Filter is an open-weight model designed for masking personally identifiable information (PII). Its key significance lies in its ability to run locally, enhancing data privacy by redacting information without leaving the user's machine.

Key Moments

AI News: The Biggest Leap We've Seen This Year!

Matt Wolfe

Science & Technology7 min read43 min video

Apr 24, 2026|125,837 views|3,796|255

AI Artificial Intelligence FutureTools Futurism Machine Learning Deep Learning Future Tools Matt Wolfe AI News AI Tools openai gpt-5

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

OpenAI's GPT-5.5 is 83% better at terminal tasks than Anthropic's unreleased 'too scary' model, but its API costs have doubled, and mainstream users may not notice a difference.

Key Insights

GPT-5.5 achieves an 82.7% on Terminal Bench, surpassing Mythos (82%) and GPT-5.4 (75%), making it better at running terminal commands than the model Anthropic refused to release.

GPT-5.5's API pricing has doubled to $5 per 1 million input tokens and $30 per 1 million output tokens compared to GPT-5.4's $2.50 and $15 respectively.

GPT Image 2.0 has become the top-ranked image model on LM Arena with a score of 1500, a significant jump from Nano Banana's 1271, demonstrating improved performance in blind taste tests.

Claude Design can now create animations, with examples shown of basic animations for Las Vegas highlights, convention center scenes, and bar graphs, which used to take hours in After Effects but can now be generated with a few prompts.

Google DeepMind's Deep Research Max model is presented as the state-of-the-art for autonomous research tasks, outperforming existing models on research-specific benchmarks.

Four different robots completed a half marathon in China in under an hour, with one robot's real-time speed recorded as faster than any human marathon runner.

GPT-5.5 demonstrates significant gains in coding and reasoning

OpenAI has released GPT-5.5, a new model accessible to premium ChatGPT and Codex users, which excels at understanding user intent with less context and performs tasks like coding, research, and data analysis more efficiently. A key improvement is its token efficiency, using significantly fewer tokens for the same tasks, though this is juxtaposed with a doubling of API pricing: $5 per 1 million input tokens and $30 per 1 million output tokens, compared to GPT-5.4's $2.50 and $15. Benchmarks show GPT-5.5 scoring 82.7% on Terminal Bench, outperforming GPT-5.4 (75%) and even Anthropic's unreleased 'Mythos' model (82%). It also scored 78.7% in operating system tasks and performed well in math and science. While many everyday users might not notice a drastic change in conversational AI, its enhanced ability to handle vague prompts and infer user needs is a significant development. For instance, when asked for a 'plan to be healthier' with minimal context, GPT-5.5 provided a highly personalized plan based on past interactions, unlike the generic response from GPT-5.4. This improved context awareness extends to coding tasks, where GPT-5.5 generated a more robust and interactive website describing its capabilities compared to GPT-5.4's less polished output. The model's improved 'doing more with less' capability means it can deliver better results with simpler prompts, and even more impressive results with detailed ones. This leap in capability, especially in understanding and executing complex tasks from minimal input, signals a shift towards more intuitive and powerful AI assistants.

GPT Image 2.0 redefines AI image generation

OpenAI's new image model, GPT Image 2.0, is making waves, with LM Arena rankings showing it far surpassing previous leaders like Nano Banana (Gemini 3.1 Flash Image). GPT Image 2.0 achieved a score of 1500, a substantial leap from Nano Banana's 1271, indicating superior performance in blind taste tests. This new model boasts enhanced capabilities, including the accurate rendering of dense text within images, a significant improvement over prior iterations. It feels less 'AI-generated' and shows accuracy across languages, utilizing world knowledge to fill gaps and even search the web for real-time information to inform image creation. Examples highlight its ability to create complex collages, generate realistic magazine pages with dense text, and produce highly detailed infographics. Demonstrations include a 360-degree equirectangular image featuring prominent tech figures, a magazine page for 'Echoes' with realistic imagery, and an impressive comic book page. A particularly notable feat showcased by Riley Brown on X involved generating book covers ('Good to Great,' 'The Intelligent Investor') with scannable barcodes that accurately linked to the respective books, even when the numbers were obscured, proving the model's advanced understanding of real-world elements. While some comparisons suggest Nano Banana Pro might still edge out GPT Image 2.0 in certain aspects of realism, the overall advancements in text rendering, detail, and context-aware generation mark a significant step forward.

Claude Design offers new visual collaboration tools

Anthropic has launched Claude Design, a feature enabling users to collaborate with Claude to produce visual content such as designs, prototypes, and presentations. Available to Claude Pro, Max, Team, and Enterprise users, it leverages the Opus 4.7 vision model and integrates directly into the Claude interface. While examples include realistic prototypes, wireframes, and pitch decks, the platform shows particular promise in generating animations—a feature not heavily emphasized in its initial announcement. The speaker showcased how Claude Design could reimagine the 'Future Tools' website, creating an animated and interactive redesign. While the aesthetic is consistent across various uses, with some designs feeling slightly busy, the capability to generate animations that mimic After Effects-level quality with simple prompts is a major highlight. Examples include animated Las Vegas maps, convention center scenes with event titles, and dynamic bar graphs showing yearly AI mentions at NAB 2026. These animations, which previously might have taken hours in After Effects, can now be generated in minutes, offering a powerful tool for content creators. Another Anthropic release, 'live artifacts' in co-work, allows for the creation of dynamic dashboards and trackers connected to apps and files, promising to refresh with current data upon opening, though this feature requires more extensive testing.

New AI models and developer tools emerge

This week saw the release of several new large language models and developer tools. Google DeepMind introduced Deep Research Max, an autonomous research agent positioned as state-of-the-art for research tasks. Alibaba launched Quinn 3.6 Max Preview, a proprietary model with enhanced agentic coding and instruction following, alongside the open-source Quinn 3.6 27B, which claims outstanding agentic coding capabilities and strong reasoning. Kenna K2.6, another open-source coding model, supports agent swarms and proactive agents, demonstrating competitive performance against models like Opus 4.6 and GPT-5.4 in certain benchmarks. OpenAI also released an open-weight model, OpenAI Privacy Filter, designed for masking personally identifiable information (PII) locally and efficiently, and ChatGPT for Clinicians, a free tool for verified US clinicians to assist with documentation and research. Anthropic expanded Claude's connectivity with new integrations for everyday apps like Instacart and Audible, and Microsoft continued to enhance Copilot's multi-step action capabilities in Word, Excel, and PowerPoint. X introduced custom timelines powered by Grok for personalized content feeds, while HeyGen's HyperFrames feature allows the creation of MP4 animations using Claude code. Ideogram introduced custom model training, enabling users to create models in their specific art style.

Warp enhances its terminal-based development environment

Warp, a terminal emulator, has introduced significant updates aimed at developers using AI agents. The platform now boasts universal agent support, allowing users to run various agents like Claude Code and Codex within a single environment without altering their workflow. Warp transforms the terminal into an agentic development hub, enabling side-by-side monitoring of multiple agents—for example, one writing code while another debugs. New features include a code review loop directly within the terminal, where agents can instantly update code based on inline comments, and a unified notification system that alerts users only when their attention is required, reducing the need for constant monitoring. These updates are designed to streamline the development process and make managing AI agents more efficient.

Controversy and insights surrounding Anthropic's Mythos model

The highly anticipated, yet unreleased, AI model Mythos from Anthropic has been at the center of controversy. Despite Anthropic's decision not to release it due to its perceived power and potential risks, unauthorized users reportedly gained access. This situation has drawn commentary, including from Sam Altman, who likened Anthropic's marketing of Mythos to selling "bomb shelters" for millions, suggesting the 'too scary to release' narrative could be a marketing tactic. While Anthropic claims no evidence suggests the unauthorized access has impacted its systems, the incident highlights the challenges of controlling powerful AI models and the public's intense curiosity regarding advanced AI capabilities.

Robotics advance with marathon completion

In a display of robotic prowess, four different robots successfully completed a half marathon in China in under an hour. One robot was recorded running faster than any human marathoner. Videos showcasing these robots revealed a variety of designs, including a bipedal robot and one resembling a plush toy. While some robots navigated the course smoothly, others encountered issues, such as one failing to clear an obstacle and another moving in the wrong direction. The event highlights the growing capabilities of humanoid and other advanced robots in complex physical tasks.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Books

●Studies Cited

●People Referenced

Common Questions

GPT-5.5 offers improved understanding with less context, handles more work independently, excels in tasks like coding and research, and is more efficient due to using fewer tokens. However, its pricing has doubled compared to GPT-5.4.

Topics

Ai Agents AI & Machine Learning Technology & Innovation Open-source AI Code Generation Large Language Models AI Image Generation Developer Tools AI Benchmarks

Mentioned in this video

Companies

OpenAI

Discussed as the developer of GPT-5.5, ChatGPT Images 2.0, and OpenAI Privacy Filter.

Anthropic

Developer of Claude models, mentioned for withholding Mythos due to its power and for releasing Claude Design and live artifacts.

Studio Ghibli

Mentioned in relation to a previous AI image model's ability to emulate its animation style, as a precursor to current capabilities.

Hugging Face

A platform where the OpenAI Privacy Filter model has been made available.

Figma

A design tool mentioned in relation to Anthropic's 'live artifacts' feature, which could potentially connect to Figma files.

Audible

One of the everyday life connectors available for Claude, allowing interaction with the app through Claude.

Instacart

One of the everyday life connectors available for Claude, allowing interaction with the app through Claude.

Alibaba

Released two new models: Quinn 3.6 Max Preview (proprietary) and Quinn 3.6 27B (open-source), both with improved agentic coding capabilities.

GitHub

A platform where the OpenAI Privacy Filter model has been made available.

Red Bull

Mentioned in the context of a commercial style that Ideogram's custom models can emulate.

TripAdvisor

One of the everyday life connectors available for Claude, allowing interaction with the app through Claude.

Software & Apps

GPT-5

A new AI model from OpenAI that understands prompts with less context, excels at various tasks, and is more efficient and capable than its predecessor.

ChatGPT

Platform where GPT-5.5 is available for plus, pro, business, and enterprise users, and where the speaker tested its personalized health plan capabilities.

Codex

A platform where GPT-5.5 is available and its features for coding tasks are highlighted.

GPT-5.4

The previous generation model, used as a comparison point for GPT-5.5's pricing, performance, and website generation capabilities.

Claude Opus

Competitor AI model mentioned in benchmarks, with Claude Opus 4.7 scoring lower than GPT-5.5 on Terminal Bench and SweBench Pro.

Mythos

An AI model by Anthropic that was deemed too scary to release, but GPT-5.5 reportedly performs better than it on Terminal Bench. Later mentioned as having been accessed by unauthorized users.

Excel

Mentioned as a tool where GPT-5.5 can perform financial modeling and as a Microsoft application where Copilot has gained agentic capabilities.

Gemini

A Google model that was part of a three-way tie for the leader on the Artificial Analysis Intelligence Index before GPT-5.5.

Claude 4.7 Opus

Mentioned as a competitor model that previously tied for leadership on the Artificial Analysis Intelligence Index and scored lower than GPT-5.5 on benchmarks.

Warp

A company that released new features for its terminal, including universal agent support and a code review loop, aimed at improving developer workflows.

Claude Code

Mentioned as an agent that can be run within Warp's environment and used in conjunction with HeyGen's HyperFrames for animation creation.

Open Code

An agent that can be run within Warp's environment, alongside Claude Code and Codex.

ChatGPT Images 2.0

OpenAI's latest image generation model, which is a significant improvement over previous versions and reportedly better than Nano Banana.

Nano Banana

A model (also known as Gemini 3.1 Flash Image) previously dominating image generation rankings, now surpassed by ChatGPT Images 2.0.

Gemini 3.1 Flash Image

Another name for the Nano Banana model, which was previously a top-ranked image generation model.

LM Arena

A platform used for comparing image models through blind taste tests, which informs the rankings of models like Nano Banana and ChatGPT Images 2.0.

Slack

Mentioned as having a logo that appeared with incorrect coloration in an AI-generated image of a Mac OS X desktop.

Claude

Anthropic's AI model, discussed for its design capabilities (Claude Design), connectors, and integration with Microsoft Word.

Claude Design

A feature by Anthropic that allows collaboration with Claude to create visual work, including animations.

Opus 4.7

Mentioned as the vision model used by Claude Design, and also as a competitor to newer models.

Future Tools

The speaker's website, which Claude Design was used to redesign, showcasing its capabilities in creating interactive websites and animations.

Quinn 3.5 397B A17B

An older model from Alibaba that is surpassed by the new Quinn 3.6 27B model in reasoning and coding tasks.

Microsoft Word

A platform where Claude is now available for Pro or Max plan users, and where Microsoft Copilot has enhanced agentic capabilities.

Quinn 3.6 Plus

The previous version of Alibaba's Quinn model, used as a comparison point for Quinn 3.6 Max Preview's improved capabilities.

Google Drive

An example of a service that could be connected to Anthropic's 'live artifacts' feature to create dynamic dashboards.

Microsoft Copilot

Enhanced with agentic capabilities in Word, Excel, and PowerPoint, allowing multi-step app-native actions.

PowerPoint

Microsoft application where Copilot can perform multi-step actions and generate new content.

Grok

An AI technology powering X's new custom timelines feature, which personalizes content based on user interests.

HeyGen

A company that released the HyperFrames feature, enabling animation creation using Claude code.

AllTrails

One of the everyday life connectors available for Claude, allowing interaction with the app through Claude.

Intuit TurboTax

One of the everyday life connectors available for Claude, allowing interaction with the app through Claude.

HyperFrames

A feature from HeyGen that uses Claude code to create animations, offering a simpler alternative to After Effects for basic animations.

Google Calendar

An example of a service that could be connected to Anthropic's 'live artifacts' feature to create dynamic dashboards.

Ideogram

An image generation tool that now allows users to train custom models on their own images to guide the art direction of new generations.

OpenAI Privacy Filter

A state-of-the-art, open-weight model for masking personally identifiable information (PII) that can be run locally.

Quinn 3.6 Max Preview

A proprietary model from Alibaba with enhanced agentic coding, world knowledge, and reliability.

Opus 4.6

A previous version of Anthropic's model, benchmarked against Kimmy K2.6, showing that the open-source model can outperform it.

Deep research

A new autonomous research agent model from Google DeepMind, described as state-of-the-art for research tasks.

Gmail

An example of a service that could be connected to Anthropic's 'live artifacts' feature to create dynamic dashboards.

Quinn 3.6 27B

An open-source model from Alibaba that excels at agentic coding and reasoning, surpassing older models.

Kimmy K2.6

An open-source coding model that performs well in long horizon coding, agent swarms, and even outperforms some state-of-the-art models on benchmarks.

GPT-5.4 extra high

A previous version of OpenAI's GPT model, benchmarked against Kimmy K2.6, showing that the open-source model can outperform it.

ChatGPT for clinicians

A free version of ChatGPT offered by OpenAI for verified clinicians in the US to assist with clinical tasks.

Studies & Research

Artificial Analysis Intelligence Index

A composite score aggregating performance across 10 benchmarks to reflect general intelligence, where GPT-5.5 is currently the leader.

People

Sam Altman

CEO of OpenAI, mentioned in an image generated by ChatGPT Images 2.0 alongside other tech leaders and in a quote about Anthropic's marketing strategy.

Jensen Huang

Mentioned in an AI-generated image featuring tech leaders.

Tim Cook

Mentioned in an AI-generated image featuring tech leaders.

Elon Musk

Mentioned in an AI-generated image featuring tech leaders and in a scenario presented by Yuchen Jin.

Matt Wolfe

The speaker, for whom an infographic was generated by ChatGPT Images 2.0, detailing his role as an AI YouTuber and tech creator.

Matthew Berman

A user who shared examples of Claude Design creating slide presentations and redesigned websites, showcasing a similar aesthetic to the speaker's examples.

Books

Good To Great

A book whose barcode was accurately generated and scanned by an AI model.

The Intelligent Investor

A book whose barcode was accurately generated and scanned by an AI model.

Media

Blink 182 Poster

An example item that might appear in a collage generated by an AI model, representing the 90s era.

Echoes

A movie poster generated by AI was shown as an example of ChatGPT Images 2.0's capabilities.

Peter Rabbit

A specific art style used as an example for Ideogram's custom model training feature.

Core Memory podcast

A podcast where Sam Altman shared his opinion on Anthropic's marketing of their Mythos model.

Products

Monster Energy Drink

An example item that might appear in a collage generated by an AI model, representing the 90s/early 2000s era.

Game Boy Pocket

An example item that might appear in a collage generated by an AI model, representing the 90s/early 2000s era.

PS2

An example item that might appear in a collage generated by an AI model, representing the 90s/early 2000s era.

Locations

San Diego

The city where Matt Wolfe, the speaker, is based.

Las Vegas

Highlighted in an animation created by Claude Design, demonstrating its ability to zoom in on locations and add text.

China

The location where a half marathon was held, featuring four robots that completed the race in under an hour.

Organizations

Google DeepMind

Released a new autonomous research agent model called Deep Research Max, which is state-of-the-art for research tasks.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free

AI News: The Biggest Leap We've Seen This Year!

Want to know something specific about what's covered?

Key Insights

GPT-5.5 demonstrates significant gains in coding and reasoning

GPT Image 2.0 redefines AI image generation

Claude Design offers new visual collaboration tools

New AI models and developer tools emerge

Warp enhances its terminal-based development environment

Controversy and insights surrounding Anthropic's Mythos model

Robotics advance with marathon completion

Mentioned in This Episode

Common Questions

Topics

Mentioned in this video

More from Matt Wolfe

ChatGPT Tutorial: 35 Tips I Wish I Knew Sooner

Ask anything from this episode.