Key Moments

AI Dev 26 x SF | Jeff Huber: Everything You Need to Know About Agentic Search

DeepLearning.AIDeepLearning.AI
Education4 min read24 min video
May 20, 2026|165 views|5
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

Agentic search promises to solve AI context and reliability issues by using a continuous loop of reading and writing information, but its effectiveness hinges on effective context engineering.

Key Insights

1

Over 45% of ChatGPT queries are centered around a specific question or topic.

2

The average human information worker spends approximately 30% of their time searching for relevant information.

3

Despite marketing of million-token context windows, many builders report models becoming unreliable past 40,000 to 100,000 tokens due to context rot.

4

Chroma's Context One model, a 20 billion parameter model, runs at 400 tokens/sec on Blackwells and 3,000 tokens/sec on Cerebras, significantly faster and cheaper than models like Opus.

5

Agentic search is a loop where a model uses tools, decides when to stop, and can employ hybrid search (dense + sparse vectors) and grep search.

6

The future of context is predicted to be continuous, extremely fast due to small language models, and centered around continual learning at the context layer rather than model weight updates.

AI as a tool: context and reasoning are key

Jeff Huber, CEO of Chroma, frames AI not as a technological deity but as a powerful tool, emphasizing the critical role of context alongside reasoning. He defines a traditional computer as a universal structured information processor and AI as a universal unstructured information processor. Huber contends that while reasoning has received significant investment and focus, context remains largely underrated, despite data showing a substantial portion of AI interactions, like over 45% of ChatGPT queries, are centered around user information or specific questions. This highlights the fundamental need for AI systems to effectively process and utilize context to be truly useful.

The human cost of information seeking

The challenge of information retrieval is not unique to AI; it mirrors the struggles of human information workers. On average, humans spend about 30% of their workday searching for the right information to perform their tasks accurately. This statistic underscores the inherent difficulty and time consumption associated with sifting through vast amounts of data. As AI agents are increasingly tasked with performing information work, they will require similar capabilities to efficiently find and utilize relevant context, making agentic search a crucial development.

Context rot: the silent killer of large context windows

Despite advancements in language model context windows, often marketed with millions of tokens, practical applications face a phenomenon known as 'context rot.' Chroma's research indicates that models do not perform consistently across large context lengths; performance often degrades significantly beyond 40,000 to 100,000 tokens, a range far below manufacturer claims. This degradation makes model outputs unreliable, akin to a coin flip. This limitation necessitates 'context engineering,' the deliberate curation and management of information fed to AI models, rather than simply relying on larger context windows. For builders, this means understanding and working within these effective limits, even when higher limits are advertised, to ensure reliable application performance.

Agentic search: a continuous loop for reading and writing context

Agentic search is proposed as a solution to context and reliability issues. It operates on a continuous loop where a 'search agent' utilizes a set of provided tools, which can include hybrid search (combining dense vector and sparse vector search), grep for full-text queries, and document retrieval. Crucially, the agent has the ability to decide when to stop its search process. This mirrors human interaction with search engines like Google, where users iteratively refine queries and explore links. Agentic search is essential for both the 'read path' (retrieving information the agent needs) and the 'write path' (determining where to store newly learned information to maintain a consistent knowledge base). This paradigm shift allows agents to manage their own context effectively.

Chroma's Context One: speed, cost, and performance breakthroughs

Chroma has developed Context One, a 20 billion parameter open-source model designed for agentic search. This model offers significant performance and cost advantages over larger, frontier models. It achieves speeds of 400 tokens per second on Blackwell hardware and 3,000 tokens per second on Cerebras hardware, far surpassing the typical 40 tokens per second of models like Opus. In terms of cost, Context One is priced at approximately $1 per million output tokens, a fraction of the cost of commercial alternatives. This efficiency makes it possible to train smaller models to excel at complex retrieval tasks, challenging the notion that larger models are always superior for such applications. Chroma claims Context One defines the Pareto frontier for accuracy versus latency and cost, with further advancements (Context 2 and 3) anticipated.

The future of context: continuous, fast, and learning-centric

Huber outlines three key predictions for the future of context: 1. Continuous context management, where a steering layer continually guides reasoning models, both through retrieval (pull) and by pushing relevant information or interruptions (push). 2. Extreme speed, driven by small, efficient language models integrated deeply into applications, enabling architectures where compute is pushed close to the data to minimize network costs and latency. 3. Continual learning primarily at the context layer, meaning systems will learn by adding knowledge to their context systems rather than by frequently fine-tuning or updating model weights. This approach is made feasible by the low cost and speed of fine-tuning models like Context One, allowing systems to adapt and integrate new knowledge efficiently.

Speed Comparison: Chroma Context 1 vs. Opus

Data extracted from this episode

ModelHardwareSpeed (Tokens/Second)Cost (per Million Output Tokens)
Chroma Context 1Blackwells400~$1
Chroma Context 1Cerebras3,000 (current)N/A
Chroma Context 1Cerebras15,000-20,000 (future)N/A
OpusN/A40 (average)$25 (for 4.6)

Common Questions

Agentic search is a loop where an AI model, acting as a search agent, has access to tools (like search functions) and can decide when to stop searching. It mimics how humans use search engines by querying, reviewing results, and exploring.

Topics

Mentioned in this video

More from DeepLearningAI

View all 80 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free