Key Moments
AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
AI agents can now have persistent memory using Oracle Database, preventing them from repeating mistakes and improving efficiency, but each memory type requires specific database tables.
Key Insights
An AI agent without memory is described as 'autocomplete with ambition', highlighting the critical need for memory in autonomous agents.
Oracle Database offers a converged experience, allowing storage and management of relational, graph, spatial, and vector data within a single engine, reducing the overhead of multiple database systems.
Memory systems for agents can be categorized into short-term (working memory, semantic cache) and long-term (procedural memory), which can be further broken down into context window, session memory, workflow, toolbox, tool logs, conversations, summaries, and knowledge bases.
Context engineering involves developing a comprehensive context input by integrating input, memory, and knowledge bases to maintain consistent token utilization and avoid context bloat.
Ingesting 50 research papers into the vector store took approximately 18-19 minutes with a stable internet connection, with each paper's arXiv ID, title, abstract, primary subjects, and authors being stored.
The comparison between a memory-enabled agent and a naive agent shows that the memory-enabled agent maintains significantly lower and more consistent token utilization over iterative tasks, unlike the naive agent which experiences escalating token usage.
The necessity of memory for AI agents
The discussion begins by highlighting the critical role of memory in AI agents, likening an agent without memory to 'autocomplete with ambition.' The presenter uses a personal anecdote about human memory limitations to draw parallels with the need for robust memory systems in AI. Without memory, agents are forced to re-establish context and context windows repeatedly, leading to inconsistent responses, hallucinations, and repeated mistakes. This perpetual re-learning process results in a 'bloat' of context and tokens. Implementing memory solutions aims to drive persistence not just for individual user sessions but across enterprise scales, ensuring consistent experiences and learning capabilities for entire teams or organizations. This addresses the challenge of long-horizon tasks, where context from past interactions might be lost, forcing agents to start over and consume excessive tokens to regain prior knowledge.
Evolution from basic chatbots to autonomous agents
The progression of conversational AI is traced from early LLM-oriented chatbots, which were a significant leap from hard-coded decision trees, to more sophisticated systems. Initial advancements involved integrating enterprise data with LLMs through Retrieval-Augmented Generation (RAG) to provide more specific and useful information without exposing private data to the public internet. This evolved further to LLM-driven workflows, where predefined sequences of LLM operations and data processing were established. The current paradigm of autonomous agents, however, shifts this dynamic. Instead of explicitly defining workflows, users provide agents with memory, tools, and access to resources, empowering them to reason, figure out tasks, and achieve goals independently. This represents a move away from rigid, 2010-era structures towards more adaptive and self-directed AI systems.
The challenge of diverse data modalities in memory systems
Building effective memory systems for agents requires handling a wide array of data types beyond simple text files or PDFs. Early RAG pipelines primarily dealt with text and basic image data. However, modern agents need to integrate and manage diverse data modalities such as relational data, graph structures for navigating relationships between topics, and spatial data (e.g., finding dealerships within a specific radius of a location). The proliferation of these data types often leads to an anti-pattern where developers deploy separate databases for each modality—one for relational, one for spatial, one for time series, and so on, potentially accumulating four to six different database systems. This multiplicity of databases creates significant overhead in management and application integration. Oracle Database is presented as a solution to this problem, offering a converged database platform that can incorporate all these different data types into a single engine, enabling on-the-fly vector embeddings and co-locating graph, relational, and vector data stores for unified management.
Architecting agent memory: Short-term and long-term
Agent memory is broadly categorized into two primary types: short-term and long-term. Short-term memory comprises 'working memory,' which is active during a current session, and a 'semantic cache' to speed up lookups against long-term memory. Long-term memory aims for consistency beyond a single session, lasting for days, weeks, or months. This includes 'procedural memory,' which stores learned procedures from repeated tasks (e.g., troubleshooting an issue 20 times), enabling agents to recall solutions quickly. These categories can be further detailed into specific memory elements such as the context window, session memory, workflow logs, toolbox usage, tool logs, conversation summaries, and knowledge bases, all managed by an agent memory manager. The choice of storage type (in-memory, vector, SQL, JSON) depends on the specific memory type and its intended use.
Key strategies: Context, memory, and harness engineering
Three interconnected strategies are proposed for building adaptive and evolving agentic systems: context engineering, memory engineering, and harness engineering. Context engineering focuses on developing a comprehensive context input by blending user prompts with memory and knowledge bases, aiming for consistent token utilization and avoiding bloat. Memory engineering involves the durable storage, management, and retrieval of data, ensuring persistence across sessions, teams, or organizations. Harness engineering, viewed as the next iteration, concerns the agent loop—how input is processed, iterated upon, and how the agent learns from its actions. This loop involves invoking the LLM, acting on its output, and continuing until a stop condition is met. The outer harness wires together components like the memory manager, context engineering, skills, sub-agents, and agent-to-agent communication, feeding into a content assembly process that continuously refines the agent's knowledge through extracted memory units from completed loops. These three elements are described as the 'three musketeers' or 'three amigos' working in tandem.
Practical implementation with Oracle Database and LangChain
The workshop demonstrates a practical implementation by building a research paper assistant using Oracle Database, LangChain, and Tabele. This involves creating seven distinct database tables for different memory types: conversational, knowledge base, workflow, toolbox, entity, summary, and tool log. The process includes setting up Oracle Database in a Docker container, connecting to it using Python libraries, and performing index cleanup for a fresh start. Vector embeddings are set up using Hugging Face embeddings and LangChain's Oracle Vector Store. Ingesting research papers—around 50 papers were processed, taking roughly 18-19 minutes—populates the knowledge base. Queries can then be performed on this ingested data, with similarity searches returning relevant papers based on vector scores. The system also defines token limits to manage context window usage, specifying when to compact, summarize, or move short-term memory to long-term storage to prevent exceeding token budgets.
Agent toolbox and model integration
A crucial aspect of agent functionality is the toolbox, which registers and makes available various tools for the agent to use. This can include internal APIs, third-party services, or open-source software. The workshop highlights the use of Tavily for web searches and the OCI GenAI service for language models, specifically Grok 3 reasoning fast. Tool descriptions are enhanced using an LLM to create more verbose definitions, improving semantic similarity search when the agent needs to find a suitable tool. The system allows for a flexible plug-in of different embedding models and LLMs, with configuration options for regional services and API keys stored securely. For example, a web search tool, when triggered, queries the web and returns information like titles, URLs, scores, and source types, which can then be stored in the agent's memory for future reference.
Context management and agent loop execution
Context management involves calculating token usage, summarizing conversations when thresholds are met (e.g., 80% usage), and offloading summarized content to longer-term memory. This process compresses the conversation size and reduces token utilization, enabling just-in-time retrieval of summaries or full content by ID. The agent loop then executes, starting with context assembly, invoking the LLM, and acting on its output. This loop continues, integrating memory, tools, and external searches as needed. A key demonstration involves an agent acting as a research paper assistant; it queries internal knowledge bases and external tools like Tavily for web searches, populating various memory types (conversational, knowledge base, workflow, tool logs) as it iterates. A comparison chart illustrates this by showing that a memory-enabled agent maintains significantly lower and more stable token utilization compared to a naive agent, which experiences escalating token usage and abrupt context resets between sessions, making the memory-enabled approach more cost-effective and consistent.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●People Referenced
Agent Memory Engineering Cheat Sheet
Practical takeaways from this episode
Do This
Avoid This
Token Utilization Comparison: Agent Memory vs. Naive Agent
Data extracted from this episode
| Session / Iteration | Agent Memory Token Usage (Tokens) | Naive Agent Token Usage (Tokens) |
|---|---|---|
| Session 1 (Initial) | Low, then peaks during new info retrieval | Spikes significantly |
| Session 2 (New Context) | Remains low due to memory persistence | Resets to zero, then spikes again |
| Extended Conversation | Maintains a consistent, relatively low trajectory | Continuously increases without bound |
Common Questions
The core problem is that AI agents without memory suffer from context loss, leading to repetitive tasks, inconsistent responses, and inefficient token usage. Agent memory aims to provide persistence and context, allowing agents to learn, adapt, and perform complex tasks more effectively.
Topics
Mentioned in this video
The company whose database technology is being used for building agent memory systems.
Mentioned as the platform where the code and sample code for the workshop are provided.
Mentioned in the context of embedding models used for vector stores.
Mentioned as an alternative to OCI GenAI for connecting to the LLM endpoint.
The specific reasoning model used from OCI GenAI for tasks like summarization and context calculation.
Used for spinning up a container with Oracle database for the workshop.
A tool integrated for web search capabilities within the agent's toolbox.
A framework used in conjunction with Oracle database for building agent memory systems.
Oracle Cloud Infrastructure Generative AI service used for LLM integration.
A repository for research papers, from which papers were ingested for the demo.
The vector store implementation from LangChain for Oracle DB.
More from DeepLearningAI
View all 80 summaries
33 minAI Dev 26 x SF | Carter Rabasa: File Systems Are the New Primitive for AI Agents
28 minAI Dev 26 x SF | Melissa Herrera: Your Agents Should Be Durable
31 minAI Dev 26 x SF | Vlad Luzin: Herding Cats—The Hidden Challenges of Multi-Agent Autonomy
34 minAI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free