What were the early challenges in building AI agents at Notion?

Early challenges included models not being capable enough, lack of function calling or tool usage, and short context lengths. The team also struggled with defining robust permission models for agents running in the background.

How does Notion balance innovation with maintaining existing features?

Notion uses a portfolio approach, working on multiple projects simultaneously. They prioritize shipping useful, working features while also exploring more experimental 'crazy' projects to stay ahead of the curve.

What is the 'software factory' concept at Notion?

The software factory refers to automating the workflow for developing, debugging, merging, reviewing, and maintaining a codebase, with multiple agents working together in a service.

How does Notion approach building new AI capabilities and products?

Notion focuses on two key skills: recognizing when to stop 'swimming upstream' against model limitations and identifying the direction the 'river is flowing' to build ahead. They also emphasize user journeys and practical utility over just cool tools.

What is Notion's philosophy on team culture and leadership in AI development?

Notion fosters a culture of low ego, comfort in deleting code, and prioritizing what's best for the product. Leadership focuses on setting objectives and providing avenues for team members to prioritize their ideas, encouraging prototypes and rapid iteration.

How does Notion implement and manage AI model evaluations (evals)?

Notion has a dedicated team for evaluations, including Model Behavior Engineers. They use a multi-layered approach with unit tests in CI, product quality evals for launch, and frontier/headroom evals to push model boundaries. Each team maintains its own evals within a framework.

What is a Model Behavior Engineer (MBE) at Notion and what do they do?

MBEs, previously called data specialists, are critical to evaluating AI models. They have moved from manual inspection to building agents that write evals or act as LLM judges, blending data science, product management, and prompt engineering skills.

How is Notion adapting to the changing role of software engineers with AI?

The definition of a software engineer is evolving. Roles are shifting from writing code directly to observing, maintaining outer systems, delegating, and context switching, similar to the transition ML engineers have experienced.

What was the history and evolution of Notion's agent framework?

Notion rebuilt its agent framework multiple times, starting with JavaScript agents, moving to an XML representation for tool calling, then to Markdown, and finally adopting SQLite for database queries to better align with model capabilities.

How does Notion handle pricing for its AI features like custom agents?

Notion uses a credit system tied to token usage, but also accounts for other costs like GPUs for fine-tuned models and web search. Pricing aims to be value-based and fair, especially for enterprise customers, and is designed to avoid wastefulness.

What is Notion's strategy for optimizing search and retrieval for agents?

Since agents constitute a majority of search traffic, Notion is reinvesting in retrieval and search optimization. This includes adapting queries, refining snippet inclusion, and rethinking indexing to better serve agent needs rather than just human search patterns.

Key Moments

Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work

Latent Space Podcast

Science & Technology6 min read86 min video

Apr 15, 2026|2,772 views|78|6

Save to Pod

Key Moments

TL;DR

Notion invested years and multiple rebuilds into custom agents, facing initial model limitations, to revolutionize enterprise work by turning their productivity tool into an agent-native system of record.

Key Insights

Notion's custom agents have seen multiple rebuilds since 2022, with the significant unlock occurring in early 2023 after the release of models like Sonnet 3.6/3.7.

The development of custom agents involved early, unsuccessful experiments with fine-tuning models for tool-calling before dedicated frameworks and improved LLM capabilities existed.

Notion prioritizes a portfolio approach to AI development, balancing 'AGI-pled' long-term projects with shipping immediately useful features, exemplified by their work on coding agents and a 'software factory'.

Productivity gains for Notion's internal teams have been significant, with custom agents used for bug triaging, automating processes, and reducing information fall-through, fundamentally changing how the company functions.

Notion's pricing for custom agents uses a credit system that abstracts beyond raw token usage to account for various model tiers, serving methods (e.g., GPUs, sandboxes), and the trade-off between intelligence, price, and latency.

The evolution of Notion's agent framework has moved from early direct model prompting and XML representations to a more model-friendly markdown and SQLite approach for querying databases, emphasizing giving models what they want.

The long road to custom agents

Notion's journey to developing custom agents was a multi-year endeavor, beginning as early as late 2022 with initial concepts for assistants that could perform background work using Notion's tools. These early attempts, predating widespread understanding of 'agents' and robust tool-calling capabilities, involved collaborations with companies like Anthropic and OpenAI. The team experimented with fine-tuning models for function calling, but faced significant hurdles due to the models' nascent capabilities and limited context windows. This iterative process involved multiple rebuilds of their agent harness, with a key breakthrough occurring around early 2023 with the advent of more capable reasoning models like Sonnet 3.6/3.7. The challenge wasn't just about model advancement but also about building a reliable product interface, especially concerning permissions and clarity for users sharing agents within teams. This approach highlights a deliberate strategy of developing AI capabilities, even when underlying technologies are not fully mature, to be prepared for future advancements.

Balancing innovation with practicality

Notion adopts a 'portfolio approach' to AI development, aiming to balance forward-looking, ambitious projects with the immediate delivery of valuable user features. This strategy involves working on multiple projects simultaneously, maintaining existing functionalities, and incubating 'crazy' projects that might seem unrealistic today but could become obvious in the future. Concepts like 'coding agents' and the broader 'software factory' – an automated workflow for developing, debugging, and maintaining codebases with integrated agents – represent these forward-looking endeavors. This careful balance is crucial for a company with an existing user base, ensuring continuous innovation without alienating current customers. The approach is guided by deep product conviction and an understanding of 'where the river is flowing' in AI development, rather than simply chasing every new technological capability.

From early experiments to agent-native systems

The development of Notion's agent capabilities has been marked by significant iteration and learning from failures. Initial attempts in late 2022 focused on making everything JavaScript-based, but the models struggled with code generation. This led to a shift towards a tool-calling abstraction, where Notion developed its own XML representation for Notion functions. A key learning from this phase was realizing that the model's environment and preferred formats were paramount; Notions' complex internal data model clashed with what the models could easily process. Subsequent iterations moved towards simpler, more model-friendly formats like markdown for page editing and mimicking SQLite for database querying. This principle of 'giving models what they want' became a guiding philosophy, simplifying the interaction layer and reducing unnecessary system complexity exposed to the AI.

Cultivating a culture of rapid iteration

Notion fosters a culture that embraces rapid iteration and de-prioritizes ego, especially within its AI development teams. Leaders emphasize building teams comfortable with deleting their own code and pivoting based on new capabilities or insights. This is supported by a 'demos over memos' philosophy, encouraging tangible prototypes. Internal hackathons and a 'design playground' with helper components further democratize AI experimentation. The company's own extensive use of Notion for daily operations provides a real-world testing ground, where prototypes are vetted and refined before broader release. This culture allows for quick adaptation, such as the recent rapid integration of image generation after an engineer expressed interest, treating the process less as a scheduled event and more as a continuous, integrated part of daily work.

Evals and model behavior understanding

Robust evaluation (evals) is critical for Notion's AI development. The company employs a multi-layered approach, including CI-integrated regression tests, product-level evals requiring high pass rates for launch, and 'frontier/headroom evals' with intentionally low pass rates (around 30%) to identify areas for model improvement. They have dedicated teams for 'agent dev velocity' and model behavior understanding, which includes data scientists, model behavior engineers, and eval engineers. These teams develop evaluation frameworks and increasingly use agents to write their own evals. Notion also actively monitors for 'secret degradation' from model providers, noting differences in quality even among vendors selling the 'same' model and sometimes accepting regressions if they optimize latency, provided these regressions are understood and controlled through their eval process.

The role of MCPs and CLIs

Notion maintains a pragmatic approach to tool integration, supporting both MCPs (Machine Communication Protocols) and CLIs (Command Line Interfaces). While acknowledging the power of CLIs for self-debugging and bootstrapping capabilities, they recognize MCPs' value for lightweight, tightly permissioned agents. Notion is committed to supporting MCPs as long as they are utilized by users, emphasizing their strong permission model. They also consider pricing, noting that deterministic CLI calls for simple tasks are more cost-effective than using language models for every interaction. The choice between a direct API call, an MCP, or an open-ended agent depends on the task's complexity, the need for control, and cost-efficiency. For Notion's core functionalities like search, they opt for custom-built integrations over standard MCPs to ensure higher quality and control over agent trajectories.

Composability and the future of enterprise work

Notion views composability as key to its AI strategy, allowing agents to coordinate through data primitives or direct invocation. This enables complex workflows by having agents interact with Notion databases or invoke other agents. For instance, a 'manager agent' can orchestrate over 30 specialized agents to streamline notifications and tasks. This approach extends to memory, which is simply represented by Notion pages and databases, allowing agents to read and write information seamlessly. Notion mail and calendar are being reimagined as agent-native capabilities, powered by specific tools optimized for performance. The company aims to make Notion the 'system of record' not just for human collaboration but for agentic work, with the expectation that a majority of future traffic will originate from agents interacting with their interface, fundamentally reframing product development to be 'agent-first'.

Pricing, model choice, and the 'agent lab' philosophy

Notion employs a credit-based pricing system for custom agents, designed to abstract over variable token costs across different models, serving tiers, and auxiliary services like web search or sandboxes. This system aims to offer fair pricing, with discounts for enterprise customers. While the market currently trends towards usage-based pricing, Notion is cautious about charging solely on token throughput due to varying model capabilities and task values. They differentiate between basic tasks and complex, high-value operations, seeking to provide value without making expensive capabilities mandatory. The company acts as a 'robo advisor' for model selection, guiding users towards the most appropriate model for their task, whether it's a powerful, expensive one or a more cost-effective 'auto' model. This also involves investing in open-source models to fill gaps in the intelligence-price-latency triangle, ensuring users have options across the capability spectrum. Notion's goal is not just to offer access to models but to help users navigate the options effectively.

Mentioned in This Episode

●Software & Apps

●Companies

●Concepts

Common Questions

Notion's custom agents are AI-powered tools that can be configured to perform various tasks within Notion. They operate by interacting with Notion's features and data, automating workflows and assisting users. The development involved iterative rebuilding and learning from model limitations.

Topics

AI & Machine Learning Technology & Innovation Prompt Engineering AI Development LLM Integration Agent Architecture Team Culture Custom Agents Software Factory

Mentioned in this video

Software & Apps

Jira

Mentioned as a tool whose search functionality Notion integrates to improve agent capabilities.

SQLite

Notion's internal database approach for agents, chosen for its compatibility with model preferences over complex internal formats.

Notion

The company discussed is Notion, which is building custom agents and has a product focused on being the best system of record for enterprise work.

Gemini

Mentioned as the model used for image generation within Notion.

Slack

Mentioned as a communication platform where Notion custom agents are shared and integrated, and used for notifications.

AWS

Used as an analogy for Notion's position in the market, similar to how DataDog relates to AWS.

Linear

Mentioned as a tool whose search functionality Notion integrates to improve agent capabilities.

Chrome DevTools

Mentioned as an example of an MCP that can break if the transport gets messed up, limiting an agent's self-fix capabilities.

PostgreSQL

Notion's underlying database technology, which has a relationship with their use of SQLite clusters for agents.

Companies

Ember

The previous company of Zach Tatar, the manager of the Notion meeting notes team.

Anthropic

Mentioned in the context of early agent development and partnerships, and its models being used for evaluation.

OpenAI

Mentioned in the context of early agent development and partnerships, and its models being used or compared.

GitHub

Mentioned as a platform where Notion engineers collaborate and as a source of code for agents, and for its role in pull requests.

DataDog

Used as an analogy to explain Notion's expertise in collaboration as distinct from AWS's cloud infrastructure.

Concepts

Open-source

Investments are being made into open-source models to fill gaps in the AI market, particularly for intelligence, price, and latency trade-offs.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free