What are the key strengths of GPT-4.5?

GPT-4.5 excels in emotional intelligence, understanding human nuances, creative generation like drafting emails and stories, and persuasive power. It also demonstrates a better grasp of humor and irony.

How does GPT-4.5 perform on benchmarks?

On simple QA benchmarks, GPT-4.5 achieved 61.9% accuracy, a significant jump from GPT-4o's 38.4%. Hallucination rates were also drastically reduced to 37% from 61.2%.

What are the limitations of GPT-4.5?

GPT-4.5 is considerably more expensive to use per token than previous models. It also falls short in specialized reasoning tasks like complex STEM problems and advanced coding when compared to models like '01'.

What is the future direction for OpenAI models like GPT-5?

OpenAI anticipates a convergence of unsupervised pre-training with specialized reasoning models, potentially leading to a unified architecture in GPT-5 that combines vast world knowledge with advanced reasoning capabilities.

What is the YC AI Startup School?

YC is hosting its first AI Startup School on June 16-17 in San Francisco, featuring top AI experts. It's free for computer science students and new grads, covering travel, but requires application due to limited space.

Key Moments

GPT-4.5 = Big Model Energy | YC Decoded

Y Combinator

Science & Technology4 min read9 min video

Mar 11, 2025|31,264 views|414|25

YC Y Combinator

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

GPT-4.5 offers human-like conversation and creativity with fewer hallucinations, but at a staggering 30x higher per-token cost, making large-scale deployment prohibitive.

Key Insights

GPT-4.5 is OpenAI's largest model yet, potentially over 10 times the size of GPT-4, and shows a dramatic reduction in hallucination rates to 37% from GPT-4o's 61.2%.

On the simple QA benchmark, GPT-4.5 achieves 61.9% accuracy, a significant leap from GPT-4o's 38.4%.

GPT-4.5 offers a more human-like prose, excelling in creative tasks like drafting emails, generating stories, and brainstorming ideas, surpassing GPT-4o on persuasive power benchmarks ('make me pay' and 'make me say').

Despite its conversational and creative strengths, GPT-4.5 falls short in structured reasoning, complex STEM, advanced math, and coding challenges compared to specialized reasoning models like '01'.

GPT-4.5 is prohibitively expensive for scaled deployment, costing 30x more per input token and 15x more per output token than GPT-4o.

Future AI models, possibly GPT-5, are expected to converge unsupervised pre-training (like GPT-4.5) with specialized reasoning capabilities.

GPT-4.5 emerges as OpenAI's most human-like model to date

GPT-4.5 has arrived, marking a significant step forward as OpenAI's largest and most human-like model. It represents the next stage in scaling unsupervised learning, boasting a deeper comprehension of the world and human experiences. While anticipation for GPT-5 was high throughout 2024, rumors of internal projects like 'strawberry' and 'Orion' fueled speculation. Eventually, OpenAI revealed '01,' a model excelling in step-by-step reasoning, and later, 'Orion' was confirmed to be released as GPT-4.5. This new model is potentially more than ten times the size of GPT-4, emphasizing advancements in both pre-training and post-training.

Enhanced conversational and creative capabilities with reduced hallucinations

GPT-4.5 demonstrates marked improvements in natural conversation, creative tasks, and complex planning. A key advancement is its dramatically reduced hallucination rate, dropping to approximately 37% from GPT-4o's 61.2%. In terms of accuracy on benchmarks like simple question answering (QA), GPT-4.5 achieved 61.9%, a substantial increase compared to GPT-4o's 38.4%. This makes it a more reliable option for general inquiries. On the creative front, it shines in drafting emails, generating imaginative stories, telling jokes, and brainstorming ideas, producing prose that is distinctly more human-like than GPT-4o. Early testers have noted its ability to be funny and understand irony, a capability previously lacking in other models. This focus on 'softer,' subjective aspects like emotional intelligence and model 'feel' is a distinguishing characteristic, assessed through 'vibes testing' with human evaluators who provide feedback on subjective qualities.

Vastly increased model size and training investment

GPT-4.5 is by far OpenAI's largest model to date, with estimates suggesting it could be more than 10 times the size of its predecessor, GPT-4. This substantial increase in scale is a direct result of advancements in unsupervised learning and extensive investment in both pre-training and post-training phases. The sheer size of the model contributes to its enhanced understanding and performance in conversational and creative domains. This scaling represents a continuation of the strategy that has driven previous AI breakthroughs, pushing the boundaries of what is computationally feasible in training large language models.

Limitations in reasoning and prohibitive costs

Despite its strengths, GPT-4.5 faces significant limitations. Compared to specialized reasoning-first models like '01', it falls notably short in structured reasoning domains, including complex STEM tasks, advanced mathematics, and challenging coding problems. Furthermore, its operational cost is a major barrier to widespread adoption. GPT-4.5 is considerably more expensive than other OpenAI models, boasting a per-input token cost that is 30 times higher than GPT-4o and a per-output token cost that is 15 times higher. These elevated costs make it impractical for deployment at scale for most applications, limiting its immediate use cases to scenarios where its unique conversational and creative abilities are paramount and cost is a secondary concern.

The evolving landscape: scaling versus reasoning

GPT-4.5 highlights the ongoing progress in scaling unsupervised learning, which continues to yield valuable improvements in accuracy, emotional intelligence, and creativity, even if these gains are becoming more incremental. However, the discussion is shifting towards reasoning as the next frontier for extracting performance gains from scaling compute. This suggests a potential future where investment at inference time, rather than solely during training, becomes more critical. The trend indicates that while scaling pre-training itself might not be over, specialized reasoning capabilities now offer significant potential for pushing AI performance boundaries further, especially in complex problem-solving.

Towards unified architectures blending knowledge and reasoning

The future of AI models likely involves a convergence of the two prevailing paradigms: large-scale unsupervised pre-training models like GPT-4.5 and specialized reasoning-focused models like '01'. Sam Altman has indicated that future iterations, possibly GPT-5, will integrate these approaches into a unified architecture. The aim is to create models that possess the vast world knowledge, creative fluency, and emotional nuance of models like GPT-4.5, combined with the robust analytical and logical reasoning capabilities of specialized systems. This fusion promises to create incredibly powerful AI systems that transcend the current trade-off between broad understanding and deep reasoning, potentially ending the era of having to choose between these two crucial AI attributes.

Mentioned in This Episode

●Software & Apps

●Organizations

●People Referenced

Common Questions

GPT-4.5 is OpenAI's latest model, described as larger and more humanlike. It shows significant improvements in natural conversation, creative tasks, and reduced hallucinations compared to GPT-4.

Topics

Ai-Ethics AI & Machine Learning Technology & Innovation Future Of AI Large Language Models AI Development AI Benchmarks Model Capabilities AI Creativity

Mentioned in this video

Software & Apps

GPT-4o

An earlier OpenAI model, used for benchmark comparisons against GPT-4.5, showing lower accuracy and higher hallucination rates.

Orion

An internal project name at OpenAI, later confirmed to be released as GPT-4.5.

GPT-5

A future, highly anticipated model from OpenAI, mentioned in the context of development rumors and the potential convergence of current AI paradigms.

GPT-4.5

OpenAI's latest large language model, noted for being larger, more humanlike, excelling in natural conversation, creative tasks, complex planning, and having reduced hallucinations compared to previous models.

Organizations

The organization hosting an AI Startup School, announced at the end of the video.

People

Sam Altman

CEO of OpenAI, who confirmed that the Orion project would be released as GPT-4.5 and suggested future convergence of AI paradigms.

Fei-Fei Li