AI Dev 25 x NYC | Jacky Liang: Why Agents Can't Find the Right Docs (And How Postgres Fixes It)

DeepLearning.AIDeepLearning.AI
Education3 min read31 min video
Dec 4, 2025|494 views|13
Save to Pod

Key Moments

TL;DR

Hybrid search combines keyword and semantic search in PostgreSQL to fix AI agent documentation retrieval errors.

Key Insights

1

Vector-only search in AI agents struggles with exact term matches for technical documentation, leading to inaccurate results.

2

Hybrid search combines keyword-based (BM25) and semantic (vector) search for more precise and relevant retrieval.

3

PostgreSQL offers a unified solution for hybrid search by integrating keyword (PGTextSearch) and vector (PGVector) capabilities.

4

Using PostgreSQL for hybrid search eliminates the complexity and maintenance overhead of managing separate vector and keyword databases.

5

PGTextSearch provides modern keyword ranking, addressing issues like keyword stuffing and document length bias found in older methods.

6

Reciprocal Rank Fusion (RRF) is an industry-standard algorithm used to effectively combine results from keyword and vector searches.

THE LIMITATIONS OF VECTOR-ONLY SEARCH

Vector-only search, while adept at understanding semantic similarity, often fails in documentation retrieval for AI agents. This is because it is too forgiving with exact terms. For instance, it might group different software versions or API names as similar, leading to incorrect suggestions for coding assistants or config guides. This lack of precision for specific keywords, version numbers, and API names causes hallucinations or missed exact matches, undermining the accuracy of RAG systems.

THE POWER OF HYBRID SEARCH

Hybrid search addresses the shortcomings of vector-only search by combining its semantic understanding with the precision of keyword search. Keyword search, using algorithms like BM25, excels at matching exact terms, version numbers, and API names. Semantic search, on the other hand, understands concepts, synonyms, and related topics. By integrating these two approaches, hybrid search leverages the strengths of both: precise keyword matching and nuanced semantic understanding, leading to significantly more accurate and relevant search results for AI agents.

POSTGRESQL AS A UNIFIED SOLUTION

PostgreSQL offers a compelling platform for implementing hybrid search, eliminating the need for separate databases. It natively supports keyword search with features like TS vector and query, and with extensions like PGVector, it can also perform semantic search. This integration means that both raw text data and its vector embeddings can reside within the same database. Managing a single PostgreSQL instance for both keyword and vector search dramatically simplifies the data architecture.

STREAMLINING ARCHITECTURE WITH PGTEXTSEARCH

The introduction of PGTextSearch, an open-source plugin for PostgreSQL, brings modern, state-of-the-art keyword search capabilities to the database. It borrows from robust architectures like Lucene and features the BM25 ranking algorithm, which is superior to older methods like PostgreSQL's TS rank. PGTextSearch addresses keyword stuffing, biased ranking based on document length, and the term saturation problem, providing more accurate keyword relevance by giving more weight to rare terms.

IMPLEMENTING HYBRID SEARCH IN THREE STEPS

Setting up hybrid search in PostgreSQL is a streamlined process. It involves creating the necessary extensions (PGVector and PGTextSearch), defining a table to store content and its embeddings, and then creating indexes for both vector search (using DiskANN with cosine similarity) and keyword search (using BM25 with appropriate language configurations). This approach ensures that both semantic and keyword data are indexed efficiently for rapid retrieval.

COMBINING RESULTS WITH RECIPROCAL RANK FUSION (RRF)

To effectively merge the results from vector and keyword searches, hybrid search utilizes the Reciprocal Rank Fusion (RRF) algorithm. RRF, a well-established and computationally simple method, combines the top results from each search type. It ranks documents that appear high in both semantic similarity and keyword relevance more favorably. This ensures that the final results presented to the AI agent are not only semantically related but also precisely match the user's query terms, offering the best of both worlds.

ADVANTAGES OF THE POSTGRESQL HYBRID APPROACH

The primary advantages of using PostgreSQL for hybrid search are significantly improved ranking accuracy and enhanced engineering simplicity. By consolidating keyword and vector search within PostgreSQL, organizations eliminate complex ETL pipelines, synchronization issues, and the cost of maintaining multiple specialized databases. This unified system allows developers to leverage all of PostgreSQL's powerful primitives, such as metadata filtering and joins, leading to more efficient development and reduced infrastructure overhead.

Common Questions

Vector-only search struggles with exact terms and version specificity. It might return deprecated API methods or suggest non-existent syntax because it prioritizes semantic similarity over precise keyword matching, leading to inaccuracies in contexts like code assistants or configuration guides.

Topics

Mentioned in this video

companyOracle Cloud

A cloud platform where the speaker previously worked.

personJacky Liang

The speaker, a Dev Advocate at TigerData, discussing search technologies and PostgreSQL.

productPG Text Search

A new PostgreSQL plugin from TigerData that offers state-of-the-art modern ranked keyword search.

conceptBM25

An algorithm for keyword search, also known as Best Matching 25, praised for its precision and handling of rare terms.

softwareOpenSearch

A search engine technology mentioned as something that PostgreSQL's hybrid approach aims to replace.

conceptTS Rank

An older ranking algorithm in PostgreSQL's text search, criticized for its poor ranking quality and susceptibility to keyword stuffing.

productPG Vector Scale

TigerData's high-performance vector search implementation for PostgreSQL.

productTimecaleDB

A product from TigerData, mentioned as a product they are known for.

companyLooker

A data analytics company where the speaker previously worked.

conceptkeyword search

A traditional search method that matches exact terms, contrasting with semantic search.

softwareElasticsearch

A search engine that uses modern ranked keyword search, presented as an alternative to PostgreSQL's older TS Rank.

companyTigerData

The company where Jacky Liang works, known for TimecaleDB, and offering PG Text Search from PostgreSQL.

conceptRRF

Reciprocal Rank Fusion, an algorithm used to combine results from vector and keyword searches in hybrid search.

softwareAWS RDS

Amazon Web Services Relational Database Service, questioned for compatibility with PG Text Search.

softwarevector embedding models

Models that convert text into numerical representations (vectors) for vector search.

softwareWeaviate

A vector database company mentioned as something that PostgreSQL's hybrid approach aims to replace.

More from DeepLearningAI

View all 65 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free