How does Manus AI perform on benchmarks like the G Benchmark?

Manus AI has shown significant improvement on the G Benchmark, outperforming previous results. Notably, as a company that doesn't train its own models but utilizes existing ones, they achieved top performance with high cost efficiency, costing significantly less per task than competitors.

What kind of tasks does the G Benchmark involve?

G Benchmark tasks involve complex, multi-step reasoning and real-world interaction. Examples include finding specific images on websites, identifying details about astronauts from historical data, and determining product brands from images while cross-referencing dates and facts from articles.

What was Manus AI's previous product, and what issues did it face?

Their previous product was 'Monica', a Chrome browser extension that offered features like article simplification and YouTube summarization, gaining 20 million users. However, browser extensions are limited to Chrome users and are not universally understood, prompting a move to a full browser.

Why did Manus AI abandon its AI browser project?

After six months of development, Manus AI decided to discontinue their AI browser project because browsers are primarily single-user focused, leading to frustrating user experiences when AI takes control. Additionally, convincing users to switch browsers from established ones like Chrome was a significant hurdle.

What were the key learnings from the AI browser project?

Key learnings included that AI agents are better suited to *use* browsers rather than *be* browsers, as AI possesses many web interaction tricks unknown to humans. They also learned that AI agents should operate in the cloud to avoid demanding user attention and that users are hesitant to switch from familiar browsers.

How does Manus AI provide its agents with a 'computer' and 'data access'?

Manus AI assigns each agent task with a dedicated virtual machine on the cloud using the e2b platform. They also purchase API access for users to obtain real-time stock data, Twitter, and LinkedIn search capabilities, mimicking how an intern would be given access to company systems.

What is Manus AI's strategy for competing with other AI agents?

Manus AI focuses on building a 'general agent' for average users by providing the 'right panel' functionality in the cloud, inspired by tools like Cursor. They avoid pre-defined workflows, instead giving the AI more context and focusing on building the environment for it to operate, rather than controlling its thinking process.

Key Moments

Building Manus AI (first ever Manus Meetup)

Latent Space Podcast

Science & Technology5 min read49 min video

Mar 27, 2025|3,016 views|68|4

Save to Pod

Key Moments

TL;DR

Manus AI makes AI agents more impactful by giving them tools and data access, exemplified by their performance on benchmarks.

Key Insights

Manus aims to provide AI with 'hands' to interact with the physical world, moving beyond pure reasoning.

The Manus AI agent is designed to give Large Language Models (LLMs) access to a computer and the internet for real-world impact.

Manus has demonstrated strong performance on the GIANT benchmark, outperforming previous benchmarks and being cost-efficient.

The company's previous product, Monica, a browser extension with 20 million users, provided valuable insights into user needs and AI interaction.

Manus's development was influenced by the realization that existing AI browsers like Arc focused too much on coding aspects, missing the needs of average users.

Key pillars for Manus include providing a computer environment (virtual machines), data access (APIs), and a 'know system' for user feedback and training.

THE ORIGIN AND PHILOSOPHY OF MANUS AI

The name Manus, derived from an MIT motto, signifies 'man and hand,' reflecting the company's core belief that AI needs to take action to have real impact. Over the past two years, LLMs have become incredibly powerful in reasoning, but they've been confined to a 'black box' without the tools to interact with the physical world. Manus aims to bridge this gap, providing AI with the 'hands' needed to affect change, much like a programmer needs a computer to test and run code, not just write it in a notebook.

MANUS AI'S BENCHMARK PERFORMANCE AND COST EFFICIENCY

Manus has undergone significant benchmarking, with early January results showing strong performance. More recent data indicates substantial improvements. They highlight their success on the GIANT benchmark, a task-oriented evaluation that requires agents to navigate the internet, find specific information, and synthesize it to answer complex questions. Notably, Manus achieves top performance while being significantly more cost-efficient than previous benchmarks, costing an average of $2 per 100 tasks, a fraction of the cost of prior solutions.

DEMONSTRATING CAPABILITIES THROUGH GIANT BENCHMARK TASKS

The GIANT benchmark encompasses complex tasks that challenge AI agents significantly. For example, one task involves finding a specific image on a website, identifying an astronaut within it, and calculating their total time in space, which requires aggregating data from multiple missions. Another task challenges the agent to identify a product brand from an image and find specific details from articles published on a particular date, often requiring extensive scrolling through web pages to extract the necessary information, showcasing Manus's ability to perform detailed web navigation and information retrieval.

EVOLUTION FROM MONICA TO THE AI BROWSER AND MANUS

Manus's journey began with Monica, a successful Chrome extension that simplified articles and summarized YouTube videos, amassing 20 million users. This experience highlighted the limitations of browser extensions and the potential for a more integrated AI experience. The team then pivoted to building an AI browser, investing heavily with 20 out of 40 employees. Despite building a functional AI browser with features like tab summarization and image upscaling, they encountered issues with single-user interaction frustrations and the difficulty of convincing users to switch browsers, leading them to pivot again.

KEY LEARNINGS FROM THE AI BROWSER PROJECT

The AI browser project yielded crucial insights: AI should control its own browser, operating in the cloud rather than controlling a user's desktop interface. This separation allows the AI to work independently without interrupting the user. Secondly, the AI's environment should be cloud-based, allowing it to run tasks without requiring constant user attention. Finally, the immense challenge of convincing users to switch from established browsers like Chrome, due to ingrained habits and expectations for feature parity, underscored the difficulty of this approach for a startup.

INSPIRATION FROM CURSOR AND THE BIRTH OF MANUS

The team drew significant inspiration from products like Cursor, an AI-powered coding editor. They observed that even non-coders found value in Cursor for tasks like data visualization and file processing, focusing on the output rather than the underlying code. This led to a key realization: focus on the 'right panel' of AI assistance (the output) and hide the 'left panel' (the complex code). This philosophy, combined with the cloud-based learnings from the browser project, formed the foundation for Manus, aiming to provide AI assistance for average users, not just developers.

THE CORE COMPONENTS OF MANUS AI

Manus is built on three primary pillars: providing AI with a computer (using virtual machines via e2b for task isolation and future software integration), granting data access (pre-paid APIs for stock data, social media searches, etc.), and integrating a 'know system' for user feedback. This system allows users to teach Manus their preferences, improving its output over time. This focus on environmental building and intelligent interaction, rather than controlling the LLM's thinking, defines Manus's approach to creating a general-purpose AI agent.

ADDRESSING THE GENERAL AGENT CHALLENGE

Manus positions itself as the first general AI agent by analyzing a substantial batch of AI projects from Y Combinator. They found that a significant percentage (76%) were agent-oriented. By comparing Manus's capabilities against these projects, they concluded that Manus covers a broad range of tasks and often outperforms specialized agents. Their strategy is not to build predefined workflows, which would be impossible to scale for every imaginable user task, but to create a simple yet sophisticated structure that provides context and intelligence to the LLM.

FUTURE DEVELOPMENT AND SCALABILITY OF MANUS

Manus is continuously evolving, addressing user feedback and expanding its capabilities. Future developments include integrating more tools and APIs based on user needs and the evolving AI landscape. They are not planning to build their own foundational models due to the prohibitive costs, instead relying on advancements in LLMs themselves. The focus remains on commoditizing agentic capabilities and ensuring Manus can handle complex, data-intensive tasks by overcoming challenges like cloudflare's security measures and accessing paywalled data through partnerships or direct payments.

Mentioned in This Episode

●Supplements

●Software & Apps

●Companies

●Organizations

●People Referenced

Common Questions

Manus AI's name comes from an old MIT motto 'Mens et Manus', meaning 'Mind and Hand'. The company's mission is to build AI systems that can take real-world actions and have a tangible impact, bridging the gap between AI reasoning and physical world execution.

Topics

Ai Agents AI & Machine Learning Technology & Innovation Programming & Software Cloud Computing AI Development Developer Tools AI Benchmarks General AI LLM Interaction

Mentioned in this video

Companies

Cloudflare

Identified as a major hurdle for agents due to blocking capabilities, requiring workarounds or future partnerships.

OpenAI

Mentioned for choosing the G Benchmark for their deep research.

Copy.ai

Mentioned as an early startup that found the power of GPT-3 before ChatGPT.

Organizations

MIT

Mentioned as the origin of the 'Manus' motto and a future visit location.

Supplements

L-Tyrosine

Mentioned in relation to AI models being 'power enough and smarter than most humans'.

Software & Apps

ChatGPT

Mentioned as a benchmark for AI capabilities that Manus aims to surpass.

Jasper

Mentioned as an early startup that found the power of GPT-3 before ChatGPT.

GPT-3

Mentioned as a powerful language model that early startups leveraged before ChatGPT.

Arc

A browser that inspired the development of Manus's AI browser project.

E2B

An open-source project used by Manus to assign each task with a virtual machine on the cloud.

Docker

Mentioned as a container solution that differs from the virtual machine approach used by Manus.

People

Josh Miller

Founder of the browser company 'Arc', whose video about discontinuing their AI browser inspired Manus.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free