AI Dev 25 x NYC | Jacky Liang: Why Agents Can't Find the Right Docs (And How Postgres Fixes It)
Key Moments
Hybrid search combines keyword and semantic search in PostgreSQL to fix AI agent documentation retrieval errors.
Key Insights
Vector-only search in AI agents struggles with exact term matches for technical documentation, leading to inaccurate results.
Hybrid search combines keyword-based (BM25) and semantic (vector) search for more precise and relevant retrieval.
PostgreSQL offers a unified solution for hybrid search by integrating keyword (PGTextSearch) and vector (PGVector) capabilities.
Using PostgreSQL for hybrid search eliminates the complexity and maintenance overhead of managing separate vector and keyword databases.
PGTextSearch provides modern keyword ranking, addressing issues like keyword stuffing and document length bias found in older methods.
Reciprocal Rank Fusion (RRF) is an industry-standard algorithm used to effectively combine results from keyword and vector searches.
THE LIMITATIONS OF VECTOR-ONLY SEARCH
Vector-only search, while adept at understanding semantic similarity, often fails in documentation retrieval for AI agents. This is because it is too forgiving with exact terms. For instance, it might group different software versions or API names as similar, leading to incorrect suggestions for coding assistants or config guides. This lack of precision for specific keywords, version numbers, and API names causes hallucinations or missed exact matches, undermining the accuracy of RAG systems.
THE POWER OF HYBRID SEARCH
Hybrid search addresses the shortcomings of vector-only search by combining its semantic understanding with the precision of keyword search. Keyword search, using algorithms like BM25, excels at matching exact terms, version numbers, and API names. Semantic search, on the other hand, understands concepts, synonyms, and related topics. By integrating these two approaches, hybrid search leverages the strengths of both: precise keyword matching and nuanced semantic understanding, leading to significantly more accurate and relevant search results for AI agents.
POSTGRESQL AS A UNIFIED SOLUTION
PostgreSQL offers a compelling platform for implementing hybrid search, eliminating the need for separate databases. It natively supports keyword search with features like TS vector and query, and with extensions like PGVector, it can also perform semantic search. This integration means that both raw text data and its vector embeddings can reside within the same database. Managing a single PostgreSQL instance for both keyword and vector search dramatically simplifies the data architecture.
STREAMLINING ARCHITECTURE WITH PGTEXTSEARCH
The introduction of PGTextSearch, an open-source plugin for PostgreSQL, brings modern, state-of-the-art keyword search capabilities to the database. It borrows from robust architectures like Lucene and features the BM25 ranking algorithm, which is superior to older methods like PostgreSQL's TS rank. PGTextSearch addresses keyword stuffing, biased ranking based on document length, and the term saturation problem, providing more accurate keyword relevance by giving more weight to rare terms.
IMPLEMENTING HYBRID SEARCH IN THREE STEPS
Setting up hybrid search in PostgreSQL is a streamlined process. It involves creating the necessary extensions (PGVector and PGTextSearch), defining a table to store content and its embeddings, and then creating indexes for both vector search (using DiskANN with cosine similarity) and keyword search (using BM25 with appropriate language configurations). This approach ensures that both semantic and keyword data are indexed efficiently for rapid retrieval.
COMBINING RESULTS WITH RECIPROCAL RANK FUSION (RRF)
To effectively merge the results from vector and keyword searches, hybrid search utilizes the Reciprocal Rank Fusion (RRF) algorithm. RRF, a well-established and computationally simple method, combines the top results from each search type. It ranks documents that appear high in both semantic similarity and keyword relevance more favorably. This ensures that the final results presented to the AI agent are not only semantically related but also precisely match the user's query terms, offering the best of both worlds.
ADVANTAGES OF THE POSTGRESQL HYBRID APPROACH
The primary advantages of using PostgreSQL for hybrid search are significantly improved ranking accuracy and enhanced engineering simplicity. By consolidating keyword and vector search within PostgreSQL, organizations eliminate complex ETL pipelines, synchronization issues, and the cost of maintaining multiple specialized databases. This unified system allows developers to leverage all of PostgreSQL's powerful primitives, such as metadata filtering and joins, leading to more efficient development and reduced infrastructure overhead.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Concepts
●People Referenced
Common Questions
Vector-only search struggles with exact terms and version specificity. It might return deprecated API methods or suggest non-existent syntax because it prioritizes semantic similarity over precise keyword matching, leading to inaccuracies in contexts like code assistants or configuration guides.
Topics
Mentioned in this video
A cloud platform where the speaker previously worked.
The speaker, a Dev Advocate at TigerData, discussing search technologies and PostgreSQL.
A new PostgreSQL plugin from TigerData that offers state-of-the-art modern ranked keyword search.
An algorithm for keyword search, also known as Best Matching 25, praised for its precision and handling of rare terms.
A search engine technology mentioned as something that PostgreSQL's hybrid approach aims to replace.
An older ranking algorithm in PostgreSQL's text search, criticized for its poor ranking quality and susceptibility to keyword stuffing.
TigerData's high-performance vector search implementation for PostgreSQL.
A product from TigerData, mentioned as a product they are known for.
A data analytics company where the speaker previously worked.
A traditional search method that matches exact terms, contrasting with semantic search.
A search engine that uses modern ranked keyword search, presented as an alternative to PostgreSQL's older TS Rank.
The company where Jacky Liang works, known for TimecaleDB, and offering PG Text Search from PostgreSQL.
Reciprocal Rank Fusion, an algorithm used to combine results from vector and keyword searches in hybrid search.
Amazon Web Services Relational Database Service, questioned for compatibility with PG Text Search.
Models that convert text into numerical representations (vectors) for vector search.
A vector database company mentioned as something that PostgreSQL's hybrid approach aims to replace.
More from DeepLearningAI
View all 65 summaries
1 minThe #1 Skill Employers Want in 2026
1 minThe truth about tech layoffs and AI..
2 minBuild and Train an LLM with JAX
1 minWhat should you learn next? #AI #deeplearning
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free