Why can't large language models (LLMs) like ChatGPT solve SRE problems alone?

LLMs often rely on outdated training data and struggle with unfiltered web search results, which can lead to irrelevant or incorrect recommendations. For critical SRE tasks, these inaccuracies can be detrimental, highlighting the need for a grounded and verifiable agent.

What are the core design principles for building a trustworthy SRE agent?

A trustworthy agent must be grounded in trusted sources (like official documentation), context-aware of the specific deployment, evidence-driven from live data, and extremely verifiable, meaning it should cite its sources.

How does the Redis context engine handle data for the agent?

The engine uses Redis as a vector library, storing chunked and embedded data from trusted sources like official Redis documentation. This allows for fast semantic search and filtering with metadata.

What is the multi-agent architecture used in the SRE agent?

Instead of a single agent, the system uses three specialized agents: a knowledge agent for documentation, a chat agent for general infrastructure tasks, and a deep triage agent for exhaustive investigations. A query router directs requests to the appropriate agent.

How does the agent mitigate the 'loss in the middle' and 'recency bias' issues with large context windows?

The agent compresses large tool outputs to summaries, stores full outputs in Redis to be expanded on demand, and caches tool results within a session to avoid redundant calls. This helps manage context window limitations and improve efficiency.

How does the agent learn and personalize over time?

The Redis agent memory server stores session messages and tool call results. An extraction pipeline uses a nano LLM to graduate valuable information into long-term memory, capturing user preferences, recurring incidents, and environment facts for personalized responses.

How can SREs verify the recommendations made by the agent?

Every recommendation includes citations to the documents accessed and tool calls made. This allows SRE engineers to trace the decision-making process back to authoritative sources and make the final verification themselves.

Key Moments

AI Dev 26 x SF | Aditi Gupta: Building SRE Agents with the Redis Context Engine

DeepLearning.AI

Education7 min read32 min video

May 20, 2026|120 views|1

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

SRE agents are now trustworthy and capable of real-time decision-making, but they cost $1M to build and require extensive, specialized infrastructure management.

Key Insights

An SRE agent was built and deployed in production for top-five financial institutions, demonstrating its scalability and reliability.

The naive approach to chunking with fixed character limits can lead to missing crucial information or dominant documents skewing results; the agent uses document-type specific strategies like keeping CLI/API docs whole.

A multi-agent architecture uses three specialized agents (knowledge, chat, deep triage) routed by a query classifier, rather than a single, monolithic agent, to prevent hallucinations.

Semantic caching, implemented with Redis, can make knowledge queries up to 15 times faster and 98% cheaper by serving previously answered questions without LLM calls.

Context window limitations in LLMs, such as 'loss in the middle' and 'recency bias,' manifest even before full capacity is reached, necessitating output compression and contextual caching.

The Redis Agent Memory Server provides both short-term (session) and long-term (persistent) memory, extracting key information like user preferences and environment facts using a nano LLM.

The complexity of modern infrastructure and the need for fast, accurate SRE support

The presentation highlights the immense complexity of modern IT infrastructure, where a single Redis service can comprise dozens of clusters across multiple regions, with hundreds of instances each having unique configurations and operational histories. This complexity makes it challenging for Site Reliability Engineers (SREs) to quickly diagnose and resolve issues. Traditional approaches struggle to keep pace, leading to high Mean Time To Resolution (MTTR) and potential service downtime. Aditi Gupta's team at Redis aimed to build a trustworthy SRE agent capable of operating effectively in such production environments, addressing the critical need for fast, context-aware, and evidence-driven answers.

Why LLMs alone are insufficient for production SRE tasks

While Large Language Models (LLMs) are rapidly advancing with larger context windows and improved reasoning capabilities, relying solely on them for SRE tasks presents significant challenges. The primary issue is the LLM's training data, which quickly becomes outdated, especially with the fast pace of software development and documentation changes. Attempting to circumvent this with web searches introduces unfiltered, potentially irrelevant, or erroneous information from community forums and outdated posts. LLMs can confidently present this incorrect information, leading to disastrous recommendations for critical infrastructure changes. This underscores the need for a specialized agent grounded in trusted, up-to-date, and relevant data sources.

Designing for trust: Grounded, context-aware, evidence-driven, and verifiable agents

The foundational goal for the SRE agent was 'trustworthiness,' which guided every architectural decision. This translates into several key principles: the agent must be grounded in authoritative sources, context-aware of specific deployments, evidence-driven by live operational data, and verifiable, meaning it must cite its sources. By adhering to these principles, the agent aims to meet SRE KPIs such as minimizing MTTR and maximizing service uptime. The knowledge base for this agent is built on official Redis documentation across various deployment types (open-source, Redis Cloud, Redis Enterprise), ensuring data quality and relevance. This information is chunked, embedded, and stored in Redis, leveraging it as a vector library for semantic search, which allows for conceptually relevant retrieval and metadata filtering, all performed at Redis's characteristic in-memory speed.

Strategic chunking for optimized retrieval and context management

The process of chunking, breaking down large texts into smaller pieces for LLMs, is critical and requires strategic consideration beyond naive fixed-size splits. A naive approach, such as chunking every 50 characters, can result in fragmented commands or parameters, leading to incomplete information. Conversely, very large documents can dominate retrieval results. The developed strategy tailors chunking to document types: CLI and API documentation are kept whole to preserve command integrity, while extremely long documents have chunk limits set to prevent overrepresentation. Metadata like YAML front matter is stripped to avoid polluting vector searches. This meticulous approach to chunking ensures that retrieval is safer, more accurate, and protects against potential LLM hallucinations.

A multi-agent architecture for specialized tasks and reduced hallucinations

Instead of a single agent attempting all tasks, a multi-agent architecture was implemented featuring three specialized agents: a Knowledge Agent for documentation queries, a Chat Agent for general triage and diagnostics with full tool access, and a Deep Triage Agent for exhaustive investigations using MapReduce. A query router classifies incoming requests and dispatches them to the appropriate agent, significantly reducing hallucinations and improving efficiency. The Deep Triage Agent breaks down complex incidents into topics, ranks them by severity, and assigns parallel workers for research, with an orchestrator consolidating findings. A crucial 'fact corrector' pass reviews the final output, using the knowledge base to remove unsafe or fabricated commands, serving as a final defense against errors.

Optimizing LLM usage with tailored model sizes and semantic caching

The strategy for model selection involves matching task complexity to LLM capability, rather than defaulting to the largest model. For heavy reasoning or final recommendations, a large model is used. However, for tasks like per-topic research in deep triage or general knowledge tasks, a smaller, faster 'mini' model is employed to manage costs and latency, especially when dealing with dozens of potential model calls in a single investigation. This 'mini' model also handles classification tasks for the agent router. Furthermore, semantic caching, powered by Redis, significantly enhances efficiency by storing and retrieving answers to recurring knowledge queries. When a new query is semantically similar to a cached one (above a certain threshold), the cached answer is returned without an LLM call, reportedly making these operations up to 15 times faster and 98% cheaper. This strategy is best suited for predictable query patterns, not dynamic tool outputs.

Addressing LLM context window limitations and managing tool outputs

The presentation addresses the well-known limitations of LLM context windows, such as the 'loss in the middle' problem (models ignoring mid-context information) and 'recency bias' (overweighting recent information). These issues emerge well before the window is full, often around the halfway mark. To combat this, especially with large tool outputs like thousands of tokens of logs, strategies include compressing large outputs to a standard summary (e.g., 500 tokens) and storing the full output in Redis. An 'expand context' tool allows the LLM to retrieve the full output if needed. Tool results are also cached per session, so if the same tool is called multiple times within a session with identical parameters, subsequent calls return immediately, preventing repeated token usage and latency, particularly beneficial for parallel processes in deep triage.

Integrating diverse knowledge sources and enabling personalized agent behavior

The agent integrates various knowledge sources beyond official documentation. 'Skills' are operational runbooks ingested as first-class documents and retrieved automatically. 'Pinned policies' ensure that critical organizational rules are always passed to the agent in system context, regardless of the query. Support tickets are ingested to leverage past resolutions for diagnosing new issues. Notably, the system supports hybrid search within Redis, combining exact text matches with semantic search to handle complex queries that require finding specific tickets (e.g., INC12345) and semantically similar incidents simultaneously. This hybrid approach ensures comprehensive information retrieval.

Memory management for persistent learning and personalized interactions

The Redis Agent Memory Server provides a sophisticated memory system for the agent. It stores both short-term (session) memory and long-term, ephemeral memory that persists across sessions. Messages and tool call results are stored in Redis. Regularly, an extraction pipeline graduates messages into longer-term memory using a nano LLM and custom extraction strategies. This process identifies user preferences, recurring incidents, and environmental facts, along with topics and named entities. This distilled information is stored, not just raw session data, to optimize storage costs and context window usage. When a query comes in, the agent can leverage this memory to retrieve relevant past information, enabling personalized interactions, such as the user's preference for succinct answers or the need for extensive command details, surfacing this context from the get-go.

Verifiability and proactive operations through citations and scheduling

Crucially, for SREs to trust the agent's recommendations, 'the agent said so' is insufficient. Every agent output includes citations to the accessed documents and tool calls, allowing on-call engineers to verify the recommendations themselves. This transparency is vital for auditing and empowering engineers to make the final decision. Furthermore, the agent can operate on a schedule, shifting from a reactive to a proactive stance. It can monitor infrastructure, run automated diagnostics, and generate daily or weekly health summaries. This scheduled, proactive operation allows the agent to alert users to potential issues before they even think to ask, significantly enhancing operational efficiency and preventing incidents.

Redis as a foundational context engine for a unified agent ecosystem

The presentation concludes by emphasizing Redis's role not just as a cache, but as a comprehensive 'context engine' for the SRE agent. It leverages Redis for its semantic vector store capabilities, acts as a thread manager, enables hybrid search, stores schedules for proactive functions, serves as a semantic cache, holds partner-added tribal knowledge, functions as the memory server, and catalogs infrastructure resources. This unified approach, enabled by Redis's diverse features, results in an agent that is grounded, context-aware, evidence-driven, and verifiable, making it a powerful tool for modern SRE operations.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Concepts

Agent Performance Gains with Semantic Caching

Data extracted from this episode

Feature	Improvement Factor	Cost Reduction
Semantic Cache	15x faster	98% cheaper

Common Questions

The agent addresses the complexity of large-scale Redis deployments, where managing numerous clusters, regions, instances, and configurations becomes challenging. It aims to provide fast, accurate, and context-aware answers to SREs when issues arise, reducing Mean Time To Resolution (MTTR).

Topics

Ai Agents AI & Machine Learning Technology & Innovation Large Language Models Retrieval-augmented Generation Vector Databases Information Retrieval Context Engine System Reliability

Mentioned in this video

Software & Apps

LLM

Large Language Models, discussed in terms of their increasing capabilities but also limitations regarding outdated data, unfiltered web search, and potential for hallucinations when used without proper grounding.

YAML

A data serialization language, mentioned in the context of stripping YAML front matter from markdown files to prevent it from polluting vector similarity searches.

Confluence

A collaboration software used for documentation, mentioned as a place where organizational knowledge might reside but go unread.

Redis

The company behind the Redis context engine and database, discussed as a core technology for building SRE agents, vector libraries, caching, and memory servers.

Redis agent memory server

A component created by the Applied AI team at Redis to manage memory within an agent, storing messages and tool call results for both short-term (session) and long-term (ephemeral) persistence.

Products

Redis Enterprise

A deployment type of Redis, discussed as part of the infrastructure managed by the SRE agent and having specific tooling.

Redis Cloud

A deployment type of Redis, mentioned as part of the infrastructure managed by the SRE agent.

Organizations

Applied AI team

The team at Redis responsible for building the SRE agent and the Redis context engine.

Concepts

MapReduce

A programming model for processing large data sets in parallel, used in the deep triage agent for exhaustive investigation by breaking down queries into smaller topics.

RRF fusion

Retrieval Rank Fusion, a technique used in hybrid search to combine results from different search mechanisms for more complex queries.

SRE agent

An agent designed for Site Reliability Engineering tasks, focused on providing trustworthy, context-aware, and verifiable recommendations for managing complex infrastructure.

Companies

Stack Overflow

A question and answer website for programmers, mentioned as an example of unfiltered web content that can be outdated or irrelevant for LLMs.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free