How is context engineering different from prompt engineering?

Prompt engineering is a subset of context engineering. While prompt engineering focuses on crafting the initial input for chat models, context engineering also addresses the dynamic flow of information from tool calls in agents, which accumulates over time.

What are the main problems with naive context management in agents?

Naive context management leads to excessively long contexts, quickly hitting the LLM's context window limit. It also results in 'context rot,' where performance degrades as the context length increases, and can lead to idiosyncratic failure modes.

What is context offloading?

Offloading context involves saving the full output of tool calls to external storage like a file system or agent state, rather than directly feeding it back into the LLM's message history. This significantly reduces token costs and prevents hitting context limits.

How does context isolation apply to multi-agent systems?

Context isolation is a strategy for multi-agent systems where only relevant context is passed between agents. This helps prevent conflicting decisions and can improve efficiency, especially when tasks can be easily parallelized, though it can be complex for tasks requiring high coordination like coding.

What are different approaches to retrieval (RAG) for code agents?

Approaches vary from complex multi-step RAG pipelines involving indexing and semantic search (like Windsurf) to simpler 'agentic retrieval' using basic file tools without indexing (like parts of Claude Code).

Is vector store indexing always necessary for retrieval?

Not necessarily. Some systems, like certain configurations of Claude Code and Cursor, achieve effective retrieval using agentic search with simple file tools and LMs.ext files containing document descriptions, proving simpler and easier to maintain than traditional indexing.

What are the risks associated with pruning or summarizing context?

The main risk is irreversible information loss, which can lead to errors. While useful for managing context size, it's advised to maintain access to raw context (e.g., via offloading) so that mistakes or insights are not permanently discarded.

How does caching relate to context engineering?

Caching prior message history can significantly reduce latency and cost by avoiding redundant processing. However, it doesn't solve the 'long context problem' or 'context rot,' as the LLM still needs to process the large cached context.

What is the 'Bitter Lesson' in AI?

The 'Bitter Lesson' posits that general algorithms, powered by more data and compute, tend to outperform specialized algorithms with many inductive biases. This implies that less structured, more general approaches often win in the long run as compute scales.

How does the Bitter Lesson apply to AI engineering?

It suggests that while structure is necessary to get AI applications working initially, this structure can become a bottleneck as models rapidly improve. Engineers should be prepared to remove structure and adopt more general approaches to leverage model advancements.

What's the difference between low-level orchestration frameworks and agent abstractions?

Low-level frameworks (like LangGraph nodes/edges) provide composable building blocks that are easier to modify. Agent abstractions can be problematic if their internal workings are opaque, making them hard to adapt or replace as models evolve, aligning with the 'Bitter Lesson' critique.

How does memory interact with context engineering?

Reading memories is akin to retrieval in context engineering. Writing memories can be tricky, but memory pairs well with human-in-the-loop systems, where user feedback can be captured to update agent preferences and instructions over time.

Key Moments

Context Engineering for Agents - Lance Martin, LangChain

Latent Space Podcast

Science & Technology5 min read64 min video

Sep 11, 2025|39,497 views|979|33

Save to Pod

Key Moments

TL;DR

Context engineering is key for advanced AI agents, managing complex information flow beyond simple prompt engineering.

Key Insights

Context engineering is crucial for managing the flow of information in AI agents, extending beyond traditional prompt engineering.

Key strategies for context engineering include offloading, reducing context, retrieval, and isolation, each addressing different challenges.

The effectiveness of context engineering strategies, especially in multi-agent systems and retrieval, depends heavily on the specific problem and task.

Different retrieval methods, from classic RAG to agentic search, offer trade-offs in complexity and performance.

Pruning and summarization for context reduction are powerful but carry risks of information loss, necessitating careful implementation.

The 'bitter lesson' of AI development emphasizes the importance of generality and sufficient compute, influencing how we engineer AI applications over time.

THE EMERGENCE OF CONTEXT ENGINEERING

The term 'context engineering' has gained traction as AI agents, often described as 'tool calling in a loop,' become more sophisticated yet challenging to manage. This concept arises from a shared experience among developers encountering difficulties in handling the vast amounts of information fed to Large Language Models (LLMs) within agentic workflows. Unlike simple chat interactions where human messages are primary, agents receive context from tool calls, leading to significant combinatorial complexity and potential performance degradation or hitting context window limits.

PROMPT ENGINEERING VS. CONTEXT ENGINEERING

Prompt engineering is a subset of context engineering, with a crucial distinction arising when moving from chat models to agents. While prompt engineering focuses on optimizing the human input to a model, context engineering encompasses managing all information inputs, including system instructions, user instructions, and, critically, the dynamic context generated by tool calls throughout an agent's execution trajectory. This expanded scope is necessary due to the sheer volume and dynamic nature of information flowing into an agent.

STRATEGIES FOR EFFECTIVE CONTEXT MANAGEMENT

Several key strategies are emerging to address context management challenges. 'Offloading' involves saving tool call outputs to external storage like disk or agent state rather than directly feeding them back into the model's context, significantly saving token costs. 'Reducing context' through summarization or pruning is vital, especially when nearing context window limits, though it requires careful execution to avoid information loss. 'Context isolation,' particularly relevant in multi-agent systems, suggests segmenting information based on agent roles to prevent conflicts and manage complexity. Finally, 'retrieval' methods, ranging from classic Retrieval Augmented Generation (RAG) to simpler agentic search, are essential for fetching relevant information.

ADVANCEMENTS IN RETRIEVAL AND AGENTIC SEARCH

Within the retrieval domain, significant divergence exists in approaches. Some agents employ complex, multi-step RAG pipelines involving classic chunking, embeddings, vector search, knowledge graphs, and re-ranking. Conversely, others, like Claude Code, demonstrate remarkable success with 'agentic retrieval,' utilizing simple tool calls to explore files without indexing. This highlights a key trade-off between intricate indexing and the potential simplicity and effectiveness of letting agents dynamically fetch information, often proving highly performant with well-structured metadata like `.md` files.

THE NUANCES OF MULTI-AGENT SYSTEMS AND CONTEXT ISOLATION

Multi-agent systems offer potential benefits but introduce complexities in context management. A primary concern is 'context isolation,' ensuring sub-agents receive only relevant information without conflicting decisions. While some argue against multi-agents due to communication difficulties and potential conflicts, others see value when tasks are easily parallelizable and primarily read-only, like information gathering for deep research. The success of multi-agent approaches often hinges on the task's nature, with coordinated writing tasks being more challenging than parallelized information collection followed by a single writing phase.

REDUCTION, PRUNING, AND THE RISK OF INFORMATION LOSS

Compacting context is a common necessity, especially as agents approach their context window limits or at tool call boundaries. Techniques like summarization and pruning are employed, but they carry inherent risks of information loss, particularly if the pruning is irreversible. Solutions like offloading raw data to disk allow for retrieval of complete context later, mitigating the 'lossy' nature of summarization. This trade-off between reducing token usage and preserving information is a critical consideration, with some advocating for keeping all interaction history to learn from mistakes, while others believe specific pruning is necessary.

CONTEXT FAILURE MODES AND THE BITTER LESSON

Context can fail in various ways, including 'context poisoning' where hallucinations or errors corrupt the agent's understanding. The 'bitter lesson' in AI development posits that general algorithms with abundant data and compute often outperform those with more engineered structure. This principle suggests that while initial structure might be necessary for current compute limitations, it can become a bottleneck as models improve exponentially. AI engineers must continually reassess assumptions and remove structure to leverage advancements, a lesson exemplified by iterative development cycles of tools like deep research agents.

FRAMING AND ABSTRACTION IN AI ENGINEERING

The discussion around frameworks and abstractions in AI engineering is nuanced. While low-level orchestration frameworks providing composable building blocks (like nodes, edges, and state) are valuable for flexibility and iteration, higher-level agent abstractions can obscure underlying mechanisms, making them harder to adapt or debug. The critique of frameworks often targets these overly simplified abstractions that may hinder the ability to 'remove structure' as per the bitter lesson, emphasizing the importance of understanding what lies beneath any abstraction to enable long-term adaptability and innovation.

THE ROLE OF MEMORY AND CACHING IN CONTEXT MANAGEMENT

Memory and caching play integral roles in context engineering. Caching prior message history can significantly reduce latency and cost, though its automatic implementation across different API providers is still evolving. While caching addresses efficiency, it doesn't inherently solve the 'long context problem' or 'context rot.' Memory, particularly when paired with human-in-the-loop systems for ambient agents, allows for learning user preferences and refining agent behavior over time. Reading memories at scale essentially converges with retrieval, treating past conversations as a specific context for information retrieval.

FUTURE DIRECTIONS AND PRACTICAL IMPLICATIONS

The rapid advancement of LLMs necessitates adaptable engineering practices, aligning with the 'bitter lesson' by favoring generality and minimizing unnecessary structure that could bottleneck future progress. Tools and frameworks that offer low-level, easily reconfigurable components are particularly valuable. Furthermore, understanding context engineering is crucial for building robust and efficient AI agents, from managing complex multi-agent interactions to optimizing retrieval strategies and preventing context-related failures, ultimately enabling more sophisticated and reliable AI applications.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Context engineering focuses on feeding an LLM the precise context needed for its next step, which is crucial for complex agentic workflows. It's more than just prompt engineering as it also manages context from tool calls within an agent's trajectory.

Topics

Context Reduction Deep Research Agents LangGraph

Mentioned in this video

People

Jean Welsh

Mentioned for a talk on MCP and its pragmatic importance in enterprises for tool standardization.

Walden Yan

Spoke on multi-agent systems and read vs. write tasks with Cognition.

Drew Brunig

Authored a blog post on context failure modes, including context poisoning.

Hyong Wan Chung

Former OpenAI, now MSL, discussed for his talk on the 'Bitter Lesson' in AI research and engineering.

Software & Apps

Open Deep Researcher

A deep research agent developed by Lance Martin, claimed to be the best performing on its benchmark.

MCP Doc

A simple MCP server that exposes LM.ext files to code agents for accessing documentation.

GPD5

Mentioned as yielding strong results for the open-source deep research assistant.

Roast

Shopify's internal orchestration framework for building LLM workflows.

Chroma

Concepts

AI Engineer

A term coined to describe software engineers proficient in AI technologies.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free