The Powerful Alternative To Fine-Tuning

Y CombinatorY Combinator
Science & Technology3 min read20 min video
Feb 27, 2026|41,029 views|809|65
Save to Pod

Key Moments

TL;DR

Harnesses atop LLMs auto-improve, beating costly fine-tuning.

Key Insights

1

Poetic builds a recursively self-improving meta-system that outputs specialized reasoning 'harnesses' on top of one or more language models to solve hard problems.

2

This approach is cheaper and more scalable than fine-tuning from scratch, and remains compatible with new frontier models as they arrive.

3

The team has demonstrated strong results (ARC AGI V2, Humanity's Last Exam) at a fraction of traditional cost, leveraging a small, focused team.

4

The core value is automated optimization of prompts, data, and reasoning strategies, not just manual prompt editing; the system creates robust, tunable architectures.

5

Startup-ready access is being offered via poetic.ai for teams facing hard, reliability-challenging AI problems.

THE PROBLEM WITH FINE-TUNING AND THE NEED FOR SPEED

The traditional path of fine-tuning large models is costly and quickly becomes outpaced by faster model releases. The guest highlights that retraining from scratch demands hundreds of millions of dollars and months of effort, and new frontier models can render those gains obsolete almost instantly. Poetic offers a radically faster alternative by building on top of existing models and evolving capabilities without expensive retraining, addressing the 'bitter lesson' of losing ground to newer models.

POETIC'S CORE: RECURSIVE SELF-IMPROVEMENT AND THE META SYSTEM

At the heart of Poetic is a recursively self-improving meta-system that can generate and optimize entire reasoning pipelines, or harnesses, tailored to a given hard problem. This automation produces systems that consistently outperform the base models and remain compatible with future iterations. The approach shifts focus from training more data to evolving the reasoning architecture itself, enabling rapid, cost-effective improvements.

HARNESSING VS TRAINING: WHY POETIC SEES FRONTIER MODELS AS STILTS

Frontier models act as foundational stilts that Poetic uses to reach higher performance without rebuilding from scratch. The harness sits on top of these models and can be adapted to new models without changing the underlying deployment. By contrast with repeated full-model training, Poetic continuously optimizes the surrounding system—prompts, data handling, and reasoning strategies—so any new base model yields immediate gains without a full rewrite.

COST ADVANTAGE AND SCALABILITY: UNDER 100K FOR HARD PROBLEMS

The company emphasizes a dramatic cost advantage: a Humanity's Last Exam run costed well under six figures, with optimization costs under 100k, and a team of just seven researchers. They note being roughly half the cost of competing approaches (e.g., Gemini 3 DeepThink) due to building on a cheaper model (Gemini 3 Pro) and not performing full-scale retraining. The result is scalable, repeatable progress on hard tasks.

PROOF OF CONCEPT: ARC AGI V2 AND HUMANITY'S LAST EXAM RESULTS

Poetic has repeatedly outpaced contemporaries on difficult benchmarks. On ARC AGI V2, they surpassed prior leaders within days, leveraging cheaper underlying models yet achieving higher official verification scores. In Humanity's Last Exam, they achieved 55%—nearly two points above the previous state-of-the-art—on a 2,500-question challenge designed for expert domains. These results illustrate the system's ability to push hard problems beyond traditional baselines.

HOW THE POETIC META SYSTEM WORKS: PROMPTS, DATA, AND AUTOMATION

The Poetic stack combines code, prompts, and data into automated reasoning systems. The meta-system can optimize not only prompts but also deeper reasoning strategies and data generation, including context stuffing and example generation. Rather than hand-tuning, the system analyzes data and failure modes to extract robust, reusable reasoning patterns, enabling faster iteration and higher-quality outputs with less human intervention.

STARTUP ACCESS: HOW TO TRY POETIC AND SIGN UP

For startups interested in deploying Poetic, the company invites early access inquiries via poetic.ai. They’re seeking hard, reliability-challenging problems where existing approaches fall short and offer to work with teams to enhance or replace their current agents with stilts that scale with newer models. The pitch emphasizes readiness for practical deployment and collaboration with eager, high-potential ventures.

CAREER PATHS AND ADVICE: FROM CORPORATE RESEARCHER TO AGI-FOCUSED BUILD

Ian Fischer shares a personal trajectory from co-founding Portable (a mobile cross-platform tool) to Google and DeepMind, then pivoting to AI robotics and machine learning research. His advice to engineers is pragmatic: try things with AI every day, push boundaries, and build the things you envision. He illustrates this with a weekend experiment building an iPhone app using GPT-5, underscoring how rapidly capabilities are advancing and how inclusive tooling has become.

Benchmark results and costs (selected data points)

Data extracted from this episode

Benchmark / ModelScore (%)Notes / Cost
ARC AGI V2 – Poetic harness on Gemini 3 Pro54Cost: $32 per problem
ARC AGI V2 baseline – Gemini 3 DeepThink45
Humanity's Last Exam – Poetic harness55Cost: < $100k
Humanity's Last Exam – Claude Opus 4.6 (Anthropic)53.1

Common Questions

Poetic offers a recursively self-improving meta-system that generates task-specific harnesses on top of one or more language models. Instead of retraining or fine-tuning a model for each task, Poetic optimizes the reasoning strategies and prompts automatically so the harness continues to improve as new models come out.

Topics

Mentioned in this video

More from Y Combinator

View all 5 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free