What are the different types of memory discussed for AI agents?

The video discusses short-term memory (working memory, semantic cache) for active sessions, and long-term memory (procedural memory for repeatable tasks, knowledge bases, workflow logs, summaries). These work together to provide a layered memory system.

Why is a converged database important for agent memory systems?

A converged database, like Oracle, allows storing and managing diverse data types (relational, spatial, vector, graph) in a single engine. This simplifies architecture, enables on-the-fly embeddings, and facilitates efficient data retrieval for agent memory.

What are the three key engineering areas for advanced AI agents mentioned?

The three key areas are context engineering (developing comprehensive context inputs), memory engineering (storing, managing, and accessing durable memory), and harness engineering (creating the agent loop for continuous evolution and learning).

How does agent memory improve token efficiency?

By storing and recalling information from memory instead of re-querying the LLM or re-processing data, agents significantly reduce token usage. Summarization and context offloading further help manage the context window and keep token counts manageable.

What is retrieval-augmented generation (RAG) and how does it relate to agent memory?

RAG integrates an LLM with an external data corpus to provide more specific and relevant answers. Agent memory builds upon RAG by adding persistent, structured memory layers that the agent can leverage for more sophisticated reasoning and task execution.

What tools and technologies are used in the demonstration?

The demonstration utilizes Oracle Database for storage, LangChain for framework integration, Hugging Face for embeddings, and Tavily for web search. The OCI GenAI service with Grok-3 is used as the LLM.

Key Moments

AI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database

DeepLearning.AI

Education7 min read52 min video

May 20, 2026|160 views|3

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

AI agents can now have persistent memory using Oracle Database, preventing them from repeating mistakes and improving efficiency, but each memory type requires specific database tables.

Key Insights

An AI agent without memory is described as 'autocomplete with ambition', highlighting the critical need for memory in autonomous agents.

Oracle Database offers a converged experience, allowing storage and management of relational, graph, spatial, and vector data within a single engine, reducing the overhead of multiple database systems.

Memory systems for agents can be categorized into short-term (working memory, semantic cache) and long-term (procedural memory), which can be further broken down into context window, session memory, workflow, toolbox, tool logs, conversations, summaries, and knowledge bases.

Context engineering involves developing a comprehensive context input by integrating input, memory, and knowledge bases to maintain consistent token utilization and avoid context bloat.

Ingesting 50 research papers into the vector store took approximately 18-19 minutes with a stable internet connection, with each paper's arXiv ID, title, abstract, primary subjects, and authors being stored.

The comparison between a memory-enabled agent and a naive agent shows that the memory-enabled agent maintains significantly lower and more consistent token utilization over iterative tasks, unlike the naive agent which experiences escalating token usage.

The necessity of memory for AI agents

The discussion begins by highlighting the critical role of memory in AI agents, likening an agent without memory to 'autocomplete with ambition.' The presenter uses a personal anecdote about human memory limitations to draw parallels with the need for robust memory systems in AI. Without memory, agents are forced to re-establish context and context windows repeatedly, leading to inconsistent responses, hallucinations, and repeated mistakes. This perpetual re-learning process results in a 'bloat' of context and tokens. Implementing memory solutions aims to drive persistence not just for individual user sessions but across enterprise scales, ensuring consistent experiences and learning capabilities for entire teams or organizations. This addresses the challenge of long-horizon tasks, where context from past interactions might be lost, forcing agents to start over and consume excessive tokens to regain prior knowledge.

Evolution from basic chatbots to autonomous agents

The progression of conversational AI is traced from early LLM-oriented chatbots, which were a significant leap from hard-coded decision trees, to more sophisticated systems. Initial advancements involved integrating enterprise data with LLMs through Retrieval-Augmented Generation (RAG) to provide more specific and useful information without exposing private data to the public internet. This evolved further to LLM-driven workflows, where predefined sequences of LLM operations and data processing were established. The current paradigm of autonomous agents, however, shifts this dynamic. Instead of explicitly defining workflows, users provide agents with memory, tools, and access to resources, empowering them to reason, figure out tasks, and achieve goals independently. This represents a move away from rigid, 2010-era structures towards more adaptive and self-directed AI systems.

The challenge of diverse data modalities in memory systems

Building effective memory systems for agents requires handling a wide array of data types beyond simple text files or PDFs. Early RAG pipelines primarily dealt with text and basic image data. However, modern agents need to integrate and manage diverse data modalities such as relational data, graph structures for navigating relationships between topics, and spatial data (e.g., finding dealerships within a specific radius of a location). The proliferation of these data types often leads to an anti-pattern where developers deploy separate databases for each modality—one for relational, one for spatial, one for time series, and so on, potentially accumulating four to six different database systems. This multiplicity of databases creates significant overhead in management and application integration. Oracle Database is presented as a solution to this problem, offering a converged database platform that can incorporate all these different data types into a single engine, enabling on-the-fly vector embeddings and co-locating graph, relational, and vector data stores for unified management.

Architecting agent memory: Short-term and long-term

Agent memory is broadly categorized into two primary types: short-term and long-term. Short-term memory comprises 'working memory,' which is active during a current session, and a 'semantic cache' to speed up lookups against long-term memory. Long-term memory aims for consistency beyond a single session, lasting for days, weeks, or months. This includes 'procedural memory,' which stores learned procedures from repeated tasks (e.g., troubleshooting an issue 20 times), enabling agents to recall solutions quickly. These categories can be further detailed into specific memory elements such as the context window, session memory, workflow logs, toolbox usage, tool logs, conversation summaries, and knowledge bases, all managed by an agent memory manager. The choice of storage type (in-memory, vector, SQL, JSON) depends on the specific memory type and its intended use.

Key strategies: Context, memory, and harness engineering

Three interconnected strategies are proposed for building adaptive and evolving agentic systems: context engineering, memory engineering, and harness engineering. Context engineering focuses on developing a comprehensive context input by blending user prompts with memory and knowledge bases, aiming for consistent token utilization and avoiding bloat. Memory engineering involves the durable storage, management, and retrieval of data, ensuring persistence across sessions, teams, or organizations. Harness engineering, viewed as the next iteration, concerns the agent loop—how input is processed, iterated upon, and how the agent learns from its actions. This loop involves invoking the LLM, acting on its output, and continuing until a stop condition is met. The outer harness wires together components like the memory manager, context engineering, skills, sub-agents, and agent-to-agent communication, feeding into a content assembly process that continuously refines the agent's knowledge through extracted memory units from completed loops. These three elements are described as the 'three musketeers' or 'three amigos' working in tandem.

Practical implementation with Oracle Database and LangChain

The workshop demonstrates a practical implementation by building a research paper assistant using Oracle Database, LangChain, and Tabele. This involves creating seven distinct database tables for different memory types: conversational, knowledge base, workflow, toolbox, entity, summary, and tool log. The process includes setting up Oracle Database in a Docker container, connecting to it using Python libraries, and performing index cleanup for a fresh start. Vector embeddings are set up using Hugging Face embeddings and LangChain's Oracle Vector Store. Ingesting research papers—around 50 papers were processed, taking roughly 18-19 minutes—populates the knowledge base. Queries can then be performed on this ingested data, with similarity searches returning relevant papers based on vector scores. The system also defines token limits to manage context window usage, specifying when to compact, summarize, or move short-term memory to long-term storage to prevent exceeding token budgets.

Agent toolbox and model integration

A crucial aspect of agent functionality is the toolbox, which registers and makes available various tools for the agent to use. This can include internal APIs, third-party services, or open-source software. The workshop highlights the use of Tavily for web searches and the OCI GenAI service for language models, specifically Grok 3 reasoning fast. Tool descriptions are enhanced using an LLM to create more verbose definitions, improving semantic similarity search when the agent needs to find a suitable tool. The system allows for a flexible plug-in of different embedding models and LLMs, with configuration options for regional services and API keys stored securely. For example, a web search tool, when triggered, queries the web and returns information like titles, URLs, scores, and source types, which can then be stored in the agent's memory for future reference.

Context management and agent loop execution

Context management involves calculating token usage, summarizing conversations when thresholds are met (e.g., 80% usage), and offloading summarized content to longer-term memory. This process compresses the conversation size and reduces token utilization, enabling just-in-time retrieval of summaries or full content by ID. The agent loop then executes, starting with context assembly, invoking the LLM, and acting on its output. This loop continues, integrating memory, tools, and external searches as needed. A key demonstration involves an agent acting as a research paper assistant; it queries internal knowledge bases and external tools like Tavily for web searches, populating various memory types (conversational, knowledge base, workflow, tool logs) as it iterates. A comparison chart illustrates this by showing that a memory-enabled agent maintains significantly lower and more stable token utilization compared to a naive agent, which experiences escalating token usage and abrupt context resets between sessions, making the memory-enabled approach more cost-effective and consistent.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●People Referenced

Agent Memory Engineering Cheat Sheet

Practical takeaways from this episode

Do This

Implement memory architectures for autonomous agents to improve workflow performance.

Leverage diverse data types (relational, graph, spatial, vector) within a converged database.

Utilize short-term, long-term, procedural, semantic, and episodic memory constructs.

Engineer context by combining input, memory, and knowledge bases for efficient token usage.

Manage memory by storing, accessing, and providing durable, retrievable layers for agents.

Use harness engineering to create feedback loops for agent learning and evolution.

Populate different memory tables (conversational, knowledge base, workflow, toolbox, entity, summary, tool log) for structured data management.

Implement token limits and summarization strategies to manage context window bloat.

Use verbose tool descriptions to improve semantic search and agent tool selection.

Integrate appropriate LLM models (e.g., Grok-3) for specific tasks like summarization and embeddings.

Calculate and monitor token usage to trigger context offloading or summarization when thresholds are met.

Allow agents to perform just-in-time retrieval from long-term memory.

Use conversational memory to recall previous interactions and reduce redundant context.

Compare agent memory performance against naive agents to demonstrate efficiency gains.

Avoid This

Rely on agents without memory, which function as 'autocomplete with ambition'.

Over-bloat context windows, leading to inconsistent responses and hallucinations.

Use multiple disparate databases for different data types, creating overhead.

Neglect guardrails when defining agent operations; ensure programmatic control over sensitive actions.

Allow agents to run amok or perform unintended database operations.

Send entire conversations into prompts without summarization or compaction when token limits are approached.

Rely solely on LLM for all information retrieval; leverage memory and tools for efficiency.

Ignore the importance of tool description verbosity for effective agent tool selection.

Use a one-size-fits-all model for all agent tasks; tailor models to specific use cases (e.g., summarization, search, embeddings).

Start over from scratch with each new session for naive agents; memory persistence is key.

Allow naive agents to continuously increase token usage without management.

Token Utilization Comparison: Agent Memory vs. Naive Agent

Data extracted from this episode

Session / Iteration	Agent Memory Token Usage (Tokens)	Naive Agent Token Usage (Tokens)
Session 1 (Initial)	Low, then peaks during new info retrieval	Spikes significantly
Session 2 (New Context)	Remains low due to memory persistence	Resets to zero, then spikes again
Extended Conversation	Maintains a consistent, relatively low trajectory	Continuously increases without bound

Common Questions

The core problem is that AI agents without memory suffer from context loss, leading to repetitive tasks, inconsistent responses, and inefficient token usage. Agent memory aims to provide persistence and context, allowing agents to learn, adapt, and perform complex tasks more effectively.

Topics

AI & Machine Learning Technology & Innovation Programming & Software Data Storage Context Management Retrieval-augmented Generation Vector Databases Agent Memory LLM Engineering AI Agent Development Token Optimization

Mentioned in this video

Companies

Oracle

The company whose database technology is being used for building agent memory systems.

GitHub

Mentioned as the platform where the code and sample code for the workshop are provided.

Hugging Face

Mentioned in the context of embedding models used for vector stores.

OpenAI

Mentioned as an alternative to OCI GenAI for connecting to the LLM endpoint.

Software & Apps

Grok 3

The specific reasoning model used from OCI GenAI for tasks like summarization and context calculation.

Docker

Used for spinning up a container with Oracle database for the workshop.

Tavily

A tool integrated for web search capabilities within the agent's toolbox.

LangChain

A framework used in conjunction with Oracle database for building agent memory systems.

OCI GenAI

Oracle Cloud Infrastructure Generative AI service used for LLM integration.

arXiv

A repository for research papers, from which papers were ingested for the demo.

Oracle vector store

The vector store implementation from LangChain for Oracle DB.

People

Eli Schilling

The speaker and cloud architect at Oracle, presenting on agent memory engineering.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free