Key Moments

The Next Breakthrough In AI Agents Is Here

Y CombinatorY Combinator
Science & Technology5 min read9 min video
Apr 8, 2025|273,747 views|5,804|205
Save to Pod
TL;DR

Manis, a general-purpose AI agent, achieves 86.5% on the Gaia benchmark, nearing human performance. However, its 'wrapper' architecture, while cost-effective, is vulnerable to competitors replicating its integrations.

Key Insights

1

Manis uses a multi-agent system where a planner agent breaks down tasks, specialized sub-agents execute them, and an executor agent synthesizes the output, unlike single large neural networks.

2

Manis achieved a score of 86.5% on the Gaia benchmark, which tests AI agents on reasoning, multimodal handling, web browsing, and tool proficiency, significantly outperforming OpenAI's Deep Research (74%) and approaching average human scores (92%).

3

Despite being labeled a 'wrapper' for integrating existing models and tools, Manis is seen as a de facto standard for many successful AI products, including Cursor and Harvey.

4

Manis demonstrates significantly lower per-task costs, approximately $2 per task, compared to integrated competitors like OpenAI's Deep Research.

5

Key differentiators for successful 'wrapper' AI startups include intuitive UI, proprietary evaluations, careful fine-tuning of foundational models, and thoughtfully designed multi-agent architectures.

6

The sustainability of 'wrapper' AI products relies on building defensible advantages such as investing in proprietary evaluations, embedding deeply into user routines, or securing unique data integrations.

Manis emerges as a sophisticated general-purpose AI agent

The landscape of AI agents has rapidly evolved from experimental to increasingly useful tools. Following advancements from platforms like OpenAI's Deep Research and Google, a new contender named Manis has captured global attention. Described as the first general-purpose AI agent, Manis has generated significant hype, with some calling it China's next 'DeepSeek moment' and the most impressive AI tool ever tried. Unlike specialized chatbots, Manis promises a comprehensive approach to AI task completion.

A novel multi-agent architecture orchestrates task execution

At the core of Manis's innovation is its multi-agent system. Instead of relying on a single, monolithic neural network, Manis operates like an executive directing a team of specialist sub-agents. A planner agent first deconstructs user prompts into a master plan with manageable subtasks. These subtasks are then delegated to specialized sub-agents, each with its own domain expertise, such as knowledge, memory, or execution. Manis supports an extensive suite of 29 integrated tools for tasks like web navigation, secure code execution, and data extraction. An executor agent then synthesizes the outputs from these subtasks into a cohesive final result. This dynamic task decomposition allows Manis to autonomously map out complex instructions into executable paths. The system also employs a technique called 'chain of thought injection' for enhanced stability and self-reflection during its reasoning processes. Underpinning Manis is Anthropic's Claude 3.7 Sonnet, with integrations like YC company Browser.use for web interaction and E2B's secure sandbox for execution.

Impressive performance on benchmarks nears human capabilities

Manis has demonstrated remarkable capabilities across real-world tasks, including travel planning, financial analysis, and creating educational content, along with structured data compilation, insurance policy comparison, and supplier sourcing. To rigorously assess its performance, Manis was tested on Gaia, a benchmark designed to evaluate AI agents' reasoning, multimodal handling, web browsing, and tool proficiency. Humans typically score around 92% on Gaia, while OpenAI's Deep Research achieved approximately 74%. Manis significantly surpassed these, scoring an impressive 86.5%, placing it just a few points shy of average human performance and setting a new state-of-the-art for AI agents on this benchmark.

The 'wrapper' debate: practicality versus replicability

Manis's architectural approach has reignited the debate around AI startups operating at the 'application layer,' often referred to as 'wrappers.' Critics dismiss such platforms as mere aggregators of existing foundational models and tools. However, this perspective overlooks the reality that many highly successful AI products, such as Cursor and Harvey, also integrate existing LLMs with external APIs and specialized tooling. The distinction between effective and ineffective wrappers, the video suggests, lies not in their architecture but in factors like intuitive user interface, proprietary evaluation methods, careful fine-tuning of underlying models, and thoughtfully designed multi-agent systems, all of which Manis appears to embody.

Cost efficiencies and user control distinguish Manis

One significant advantage of Manis's multi-agent orchestration is its cost-effectiveness. It reportedly achieves substantially lower per-task costs, around $2 per task, compared to integrated competitors like OpenAI's Deep Research. Furthermore, Manis offers greater transparency and user control by allowing direct inspection, customization, or replacement of its individual sub-agents and tool integrations—a flexibility often lacking in more centralized platforms. This approach provides users with a clearer view of the AI's operations, unlike the opaque processes of tools like ChatGPT, and hints at a future of more interactive, desktop-integrated AI experiences.

Vulnerabilities inherent in the wrapper model

Despite its strengths, Manis faces inherent limitations typical of wrapper-based AI. The coordination required across specialized agents can become increasingly challenging as tasks grow in scale and complexity. More critically, its current competitive advantages—UX refinements, targeted fine-tuning, and specialized integrations—are susceptible to replication by competitors. Such wrappers are also vulnerable to external shifts, like API pricing changes or policy modifications by foundational model providers, which can rapidly erode cost benefits. This highlights that while wrappers allow for rapid deployment and iteration at lower upfront costs, they carry significant risks of disruption. The core challenge for Manis and similar startups is not whether the wrapper model is viable, but how to establish sustainable differentiation. Founders must strategically invest in difficult-to-replicate proprietary evaluations, embed their workflows deeply into user routines to increase switching costs, or secure exclusive integrations with platforms and data sets that competitors cannot easily access. Ultimately, success in the AI domain hinges less on reinventing foundational technology and more on the ability to creatively and effectively assemble existing components into a product that users find genuinely valuable and indispensable.

AI Agent Benchmark Performance Comparison

Data extracted from this episode

AgentGaia Score (%)
Human (Average)92
Manis86.5
OpenAI Deep Research74

Cost Comparison per Task

Data extracted from this episode

PlatformCost per Task
Manis~$2
OpenAI Deep ResearchHigher

Common Questions

Manis is a new agentic AI platform that functions as a multi-agent system, coordinating specialized sub-agents to complete a wide range of tasks. It's seen as a breakthrough for its sophisticated architecture, its ability to dynamically decompose complex tasks, and its performance on benchmarks like Gaia, nearing human-level capabilities.

Topics

Mentioned in this video

More from Y Combinator

View all 562 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free