Key Moments
Notion’s Sarah Sachs & Simon Last on Custom Agents, Evals, and the Future of Work
Key Moments
Notion invested years and multiple rebuilds into custom agents, facing initial model limitations, to revolutionize enterprise work by turning their productivity tool into an agent-native system of record.
Key Insights
Notion's custom agents have seen multiple rebuilds since 2022, with the significant unlock occurring in early 2023 after the release of models like Sonnet 3.6/3.7.
The development of custom agents involved early, unsuccessful experiments with fine-tuning models for tool-calling before dedicated frameworks and improved LLM capabilities existed.
Notion prioritizes a portfolio approach to AI development, balancing 'AGI-pled' long-term projects with shipping immediately useful features, exemplified by their work on coding agents and a 'software factory'.
Productivity gains for Notion's internal teams have been significant, with custom agents used for bug triaging, automating processes, and reducing information fall-through, fundamentally changing how the company functions.
Notion's pricing for custom agents uses a credit system that abstracts beyond raw token usage to account for various model tiers, serving methods (e.g., GPUs, sandboxes), and the trade-off between intelligence, price, and latency.
The evolution of Notion's agent framework has moved from early direct model prompting and XML representations to a more model-friendly markdown and SQLite approach for querying databases, emphasizing giving models what they want.
The long road to custom agents
Notion's journey to developing custom agents was a multi-year endeavor, beginning as early as late 2022 with initial concepts for assistants that could perform background work using Notion's tools. These early attempts, predating widespread understanding of 'agents' and robust tool-calling capabilities, involved collaborations with companies like Anthropic and OpenAI. The team experimented with fine-tuning models for function calling, but faced significant hurdles due to the models' nascent capabilities and limited context windows. This iterative process involved multiple rebuilds of their agent harness, with a key breakthrough occurring around early 2023 with the advent of more capable reasoning models like Sonnet 3.6/3.7. The challenge wasn't just about model advancement but also about building a reliable product interface, especially concerning permissions and clarity for users sharing agents within teams. This approach highlights a deliberate strategy of developing AI capabilities, even when underlying technologies are not fully mature, to be prepared for future advancements.
Balancing innovation with practicality
Notion adopts a 'portfolio approach' to AI development, aiming to balance forward-looking, ambitious projects with the immediate delivery of valuable user features. This strategy involves working on multiple projects simultaneously, maintaining existing functionalities, and incubating 'crazy' projects that might seem unrealistic today but could become obvious in the future. Concepts like 'coding agents' and the broader 'software factory' – an automated workflow for developing, debugging, and maintaining codebases with integrated agents – represent these forward-looking endeavors. This careful balance is crucial for a company with an existing user base, ensuring continuous innovation without alienating current customers. The approach is guided by deep product conviction and an understanding of 'where the river is flowing' in AI development, rather than simply chasing every new technological capability.
From early experiments to agent-native systems
The development of Notion's agent capabilities has been marked by significant iteration and learning from failures. Initial attempts in late 2022 focused on making everything JavaScript-based, but the models struggled with code generation. This led to a shift towards a tool-calling abstraction, where Notion developed its own XML representation for Notion functions. A key learning from this phase was realizing that the model's environment and preferred formats were paramount; Notions' complex internal data model clashed with what the models could easily process. Subsequent iterations moved towards simpler, more model-friendly formats like markdown for page editing and mimicking SQLite for database querying. This principle of 'giving models what they want' became a guiding philosophy, simplifying the interaction layer and reducing unnecessary system complexity exposed to the AI.
Cultivating a culture of rapid iteration
Notion fosters a culture that embraces rapid iteration and de-prioritizes ego, especially within its AI development teams. Leaders emphasize building teams comfortable with deleting their own code and pivoting based on new capabilities or insights. This is supported by a 'demos over memos' philosophy, encouraging tangible prototypes. Internal hackathons and a 'design playground' with helper components further democratize AI experimentation. The company's own extensive use of Notion for daily operations provides a real-world testing ground, where prototypes are vetted and refined before broader release. This culture allows for quick adaptation, such as the recent rapid integration of image generation after an engineer expressed interest, treating the process less as a scheduled event and more as a continuous, integrated part of daily work.
Evals and model behavior understanding
Robust evaluation (evals) is critical for Notion's AI development. The company employs a multi-layered approach, including CI-integrated regression tests, product-level evals requiring high pass rates for launch, and 'frontier/headroom evals' with intentionally low pass rates (around 30%) to identify areas for model improvement. They have dedicated teams for 'agent dev velocity' and model behavior understanding, which includes data scientists, model behavior engineers, and eval engineers. These teams develop evaluation frameworks and increasingly use agents to write their own evals. Notion also actively monitors for 'secret degradation' from model providers, noting differences in quality even among vendors selling the 'same' model and sometimes accepting regressions if they optimize latency, provided these regressions are understood and controlled through their eval process.
The role of MCPs and CLIs
Notion maintains a pragmatic approach to tool integration, supporting both MCPs (Machine Communication Protocols) and CLIs (Command Line Interfaces). While acknowledging the power of CLIs for self-debugging and bootstrapping capabilities, they recognize MCPs' value for lightweight, tightly permissioned agents. Notion is committed to supporting MCPs as long as they are utilized by users, emphasizing their strong permission model. They also consider pricing, noting that deterministic CLI calls for simple tasks are more cost-effective than using language models for every interaction. The choice between a direct API call, an MCP, or an open-ended agent depends on the task's complexity, the need for control, and cost-efficiency. For Notion's core functionalities like search, they opt for custom-built integrations over standard MCPs to ensure higher quality and control over agent trajectories.
Composability and the future of enterprise work
Notion views composability as key to its AI strategy, allowing agents to coordinate through data primitives or direct invocation. This enables complex workflows by having agents interact with Notion databases or invoke other agents. For instance, a 'manager agent' can orchestrate over 30 specialized agents to streamline notifications and tasks. This approach extends to memory, which is simply represented by Notion pages and databases, allowing agents to read and write information seamlessly. Notion mail and calendar are being reimagined as agent-native capabilities, powered by specific tools optimized for performance. The company aims to make Notion the 'system of record' not just for human collaboration but for agentic work, with the expectation that a majority of future traffic will originate from agents interacting with their interface, fundamentally reframing product development to be 'agent-first'.
Pricing, model choice, and the 'agent lab' philosophy
Notion employs a credit-based pricing system for custom agents, designed to abstract over variable token costs across different models, serving tiers, and auxiliary services like web search or sandboxes. This system aims to offer fair pricing, with discounts for enterprise customers. While the market currently trends towards usage-based pricing, Notion is cautious about charging solely on token throughput due to varying model capabilities and task values. They differentiate between basic tasks and complex, high-value operations, seeking to provide value without making expensive capabilities mandatory. The company acts as a 'robo advisor' for model selection, guiding users towards the most appropriate model for their task, whether it's a powerful, expensive one or a more cost-effective 'auto' model. This also involves investing in open-source models to fill gaps in the intelligence-price-latency triangle, ensuring users have options across the capability spectrum. Notion's goal is not just to offer access to models but to help users navigate the options effectively.
Mentioned in This Episode
●Software & Apps
●Companies
●Concepts
Common Questions
Notion's custom agents are AI-powered tools that can be configured to perform various tasks within Notion. They operate by interacting with Notion's features and data, automating workflows and assisting users. The development involved iterative rebuilding and learning from model limitations.
Topics
Mentioned in this video
Mentioned as a tool whose search functionality Notion integrates to improve agent capabilities.
Notion's internal database approach for agents, chosen for its compatibility with model preferences over complex internal formats.
The company discussed is Notion, which is building custom agents and has a product focused on being the best system of record for enterprise work.
Mentioned as the model used for image generation within Notion.
Mentioned as a communication platform where Notion custom agents are shared and integrated, and used for notifications.
Used as an analogy for Notion's position in the market, similar to how DataDog relates to AWS.
Mentioned as a tool whose search functionality Notion integrates to improve agent capabilities.
Mentioned as an example of an MCP that can break if the transport gets messed up, limiting an agent's self-fix capabilities.
Notion's underlying database technology, which has a relationship with their use of SQLite clusters for agents.
The previous company of Zach Tatar, the manager of the Notion meeting notes team.
Mentioned in the context of early agent development and partnerships, and its models being used for evaluation.
Mentioned in the context of early agent development and partnerships, and its models being used or compared.
Mentioned as a platform where Notion engineers collaborate and as a source of code for agents, and for its role in pull requests.
Used as an analogy to explain Notion's expertise in collaboration as distinct from AWS's cloud infrastructure.
More from Latent Space
View all 206 summaries
58 min⚡️ The best engineers don't write the most code. They delete the most code. — Stay Sassy
78 minExtreme Harness Engineering for the 1B token/day Dark Factory — Ryan Lopopolo, OpenAI Frontier
77 minMarc Andreessen introspects on Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"
67 minMoonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Get Started Free