Key Moments
GPT-4.5 = Big Model Energy | YC Decoded
Key Moments
GPT-4.5 offers human-like conversation and creativity with fewer hallucinations, but at a staggering 30x higher per-token cost, making large-scale deployment prohibitive.
Key Insights
GPT-4.5 is OpenAI's largest model yet, potentially over 10 times the size of GPT-4, and shows a dramatic reduction in hallucination rates to 37% from GPT-4o's 61.2%.
On the simple QA benchmark, GPT-4.5 achieves 61.9% accuracy, a significant leap from GPT-4o's 38.4%.
GPT-4.5 offers a more human-like prose, excelling in creative tasks like drafting emails, generating stories, and brainstorming ideas, surpassing GPT-4o on persuasive power benchmarks ('make me pay' and 'make me say').
Despite its conversational and creative strengths, GPT-4.5 falls short in structured reasoning, complex STEM, advanced math, and coding challenges compared to specialized reasoning models like '01'.
GPT-4.5 is prohibitively expensive for scaled deployment, costing 30x more per input token and 15x more per output token than GPT-4o.
Future AI models, possibly GPT-5, are expected to converge unsupervised pre-training (like GPT-4.5) with specialized reasoning capabilities.
GPT-4.5 emerges as OpenAI's most human-like model to date
GPT-4.5 has arrived, marking a significant step forward as OpenAI's largest and most human-like model. It represents the next stage in scaling unsupervised learning, boasting a deeper comprehension of the world and human experiences. While anticipation for GPT-5 was high throughout 2024, rumors of internal projects like 'strawberry' and 'Orion' fueled speculation. Eventually, OpenAI revealed '01,' a model excelling in step-by-step reasoning, and later, 'Orion' was confirmed to be released as GPT-4.5. This new model is potentially more than ten times the size of GPT-4, emphasizing advancements in both pre-training and post-training.
Enhanced conversational and creative capabilities with reduced hallucinations
GPT-4.5 demonstrates marked improvements in natural conversation, creative tasks, and complex planning. A key advancement is its dramatically reduced hallucination rate, dropping to approximately 37% from GPT-4o's 61.2%. In terms of accuracy on benchmarks like simple question answering (QA), GPT-4.5 achieved 61.9%, a substantial increase compared to GPT-4o's 38.4%. This makes it a more reliable option for general inquiries. On the creative front, it shines in drafting emails, generating imaginative stories, telling jokes, and brainstorming ideas, producing prose that is distinctly more human-like than GPT-4o. Early testers have noted its ability to be funny and understand irony, a capability previously lacking in other models. This focus on 'softer,' subjective aspects like emotional intelligence and model 'feel' is a distinguishing characteristic, assessed through 'vibes testing' with human evaluators who provide feedback on subjective qualities.
Vastly increased model size and training investment
GPT-4.5 is by far OpenAI's largest model to date, with estimates suggesting it could be more than 10 times the size of its predecessor, GPT-4. This substantial increase in scale is a direct result of advancements in unsupervised learning and extensive investment in both pre-training and post-training phases. The sheer size of the model contributes to its enhanced understanding and performance in conversational and creative domains. This scaling represents a continuation of the strategy that has driven previous AI breakthroughs, pushing the boundaries of what is computationally feasible in training large language models.
Limitations in reasoning and prohibitive costs
Despite its strengths, GPT-4.5 faces significant limitations. Compared to specialized reasoning-first models like '01', it falls notably short in structured reasoning domains, including complex STEM tasks, advanced mathematics, and challenging coding problems. Furthermore, its operational cost is a major barrier to widespread adoption. GPT-4.5 is considerably more expensive than other OpenAI models, boasting a per-input token cost that is 30 times higher than GPT-4o and a per-output token cost that is 15 times higher. These elevated costs make it impractical for deployment at scale for most applications, limiting its immediate use cases to scenarios where its unique conversational and creative abilities are paramount and cost is a secondary concern.
The evolving landscape: scaling versus reasoning
GPT-4.5 highlights the ongoing progress in scaling unsupervised learning, which continues to yield valuable improvements in accuracy, emotional intelligence, and creativity, even if these gains are becoming more incremental. However, the discussion is shifting towards reasoning as the next frontier for extracting performance gains from scaling compute. This suggests a potential future where investment at inference time, rather than solely during training, becomes more critical. The trend indicates that while scaling pre-training itself might not be over, specialized reasoning capabilities now offer significant potential for pushing AI performance boundaries further, especially in complex problem-solving.
Towards unified architectures blending knowledge and reasoning
The future of AI models likely involves a convergence of the two prevailing paradigms: large-scale unsupervised pre-training models like GPT-4.5 and specialized reasoning-focused models like '01'. Sam Altman has indicated that future iterations, possibly GPT-5, will integrate these approaches into a unified architecture. The aim is to create models that possess the vast world knowledge, creative fluency, and emotional nuance of models like GPT-4.5, combined with the robust analytical and logical reasoning capabilities of specialized systems. This fusion promises to create incredibly powerful AI systems that transcend the current trade-off between broad understanding and deep reasoning, potentially ending the era of having to choose between these two crucial AI attributes.
Mentioned in This Episode
●Software & Apps
●Organizations
●People Referenced
Common Questions
GPT-4.5 is OpenAI's latest model, described as larger and more humanlike. It shows significant improvements in natural conversation, creative tasks, and reduced hallucinations compared to GPT-4.
Topics
Mentioned in this video
An earlier OpenAI model, used for benchmark comparisons against GPT-4.5, showing lower accuracy and higher hallucination rates.
An internal project name at OpenAI, later confirmed to be released as GPT-4.5.
A future, highly anticipated model from OpenAI, mentioned in the context of development rumors and the potential convergence of current AI paradigms.
OpenAI's latest large language model, noted for being larger, more humanlike, excelling in natural conversation, creative tasks, complex planning, and having reduced hallucinations compared to previous models.
CEO of OpenAI, who confirmed that the Orion project would be released as GPT-4.5 and suggested future convergence of AI paradigms.
Mentioned as a confirmed speaker at the YC AI Startup School.
Mentioned as a confirmed speaker at the YC AI Startup School.
Mentioned as a confirmed speaker at the YC AI Startup School.
Mentioned as a confirmed speaker at the YC AI Startup School.
Mentioned as a confirmed speaker at the YC AI Startup School.
More from Y Combinator
View all 562 summaries
14 minInside The Startup Reinventing The $6 Trillion Chemical Manufacturing Industry
1 minThis Is The Holy Grail Of AI
40 minIndia’s Fastest Growing AI Startup
1 minStartup School is coming to India! 🇮🇳
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free