Hypothetical Document Embeddings (HyDE) in RAG: A Deep Dive

shrihari kattishrihari katti
6 min read

What is HyDE?

Hypothetical Document Embeddings (HyDE) is an innovative approach to improve document retrieval in RAG systems. Instead of embedding the user’s query directly, HyDE uses a language model to generate a hypothetical document—a concise, plausible answer to the query. This hypothetical document is then embedded into a vector representation, which is used to retrieve similar real documents from a vector store. The retrieved documents serve as context for the final answer.

For example:

  • Query: "How does Python manage memory?"

  • Hypothetical Document: "Python manages memory using a garbage collector that employs reference counting and cyclic reference detection to deallocate unused objects."

  • Process: Embed this hypothetical document and retrieve real documents with similar embeddings.

How HyDE Fits into RAG

A standard RAG system works like this:

  1. Query Embedding: The user’s query is converted into an embedding.

  2. Retrieval: Documents with embeddings closest to the query’s embedding are retrieved.

  3. Generation: A language model generates an answer using the retrieved documents as context.

However, if the query’s wording doesn’t align with the document’s text (e.g., "manage memory" vs. "garbage collection"), retrieval can miss critical chunks. HyDE enhances this process by shifting the focus from the query’s literal phrasing to the semantics of a potential answer. Here’s how it modifies RAG:

  1. Hypothetical Answer Generation: A language model creates a short, hypothetical answer to the query.

  2. Embedding the Hypothetical Answer: This answer is embedded, capturing the essence of what a good response should contain.

  3. Semantic Retrieval: Real document chunks with embeddings closest to the hypothetical answer’s embedding are retrieved.

  4. Answer Generation: The language model uses these chunks to produce the final response.

Why HyDE Improves RAG

  • Semantic Alignment: By embedding a hypothetical answer rather than the raw query, HyDE retrieves documents based on meaning, not just keyword overlap.

  • Robustness to Phrasing: It excels with indirect or rephrased queries (e.g., "How does Python free memory?" vs. "Python memory management").

  • Contextual Relevance: The retrieved documents are more likely to contain the information needed to answer the query accurately.

Limitations of HyDE

  • Extra Computation: Generating and embedding a hypothetical document adds a processing step.

  • Dependence on Quality: If the hypothetical answer is inaccurate, retrieval may falter.

  • Model Reliance: Effectiveness hinges on the language model’s ability to generate meaningful hypothetical answers.

System Overview

  1. Load and Split PDF: Extract text from a PDF and split it into manageable chunks.

  2. Embed and Store: Generate embeddings for the chunks and store them in Qdrant, a vector database.

  3. Generate Hypothetical Document: Use a language model to create a hypothetical answer to the user’s query.

  4. Embed Hypothetical Document: Convert the hypothetical answer into an embedding.

  5. Retrieve Documents: Find PDF chunks with embeddings similar to the hypothetical document’s embedding.

  6. Generate Response: Use the retrieved chunks as context to answer the query.

  7. Interactive Loop: Allow continuous user queries.

Code Walkthrough

Step 1: Import Required Libraries

We’ll use a consistent set of libraries for PDF processing, embeddings, vector storage, and language model interaction.

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore
import os

Step 2: Load and Process the PDF

Load the PDF file to access its content.

pdf_path = Path(__file__).parent / "Python Programming.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

Step 3: Split the Document into Chunks

Break the PDF text into smaller pieces for embedding and retrieval.

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)

Step 4: Set Up Embeddings

Initialize the embedding model to convert text into vectors.

embedder = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004",
    google_api_key="YOUR_API_KEY"  # Replace with your Google API key
)

Step 5: Initialize the Vector Store

Store the chunk embeddings in Qdrant for fast similarity search.

vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedder,
    url="http://localhost:6333",  # Local Qdrant instance
    collection_name="pdf_assistant"
)

Step 6: Set Up the Language Model

Initialize the language model for generating hypothetical answers and final responses.

llm = ChatGoogleGenerativeAI(
    model="gemini-pro",  # Example model; replace with an available one
    google_api_key="YOUR_API_KEY"  # Replace with your Google API key
)

Step 7: Define the System Prompt

Create a prompt to guide the assistant’s behavior.

SYSTEM_PROMPT = """
You are a smart PDF assistant designed to help users understand a PDF document’s content. Your task is to provide accurate, clear, and concise responses based on the user’s query and the provided PDF excerpts. Follow these guidelines:

1. **Query Handling**:
   - For specific queries, extract relevant information directly.
   - For general queries, provide a concise overview.

2. **Use Excerpts Only**:
   - Base your response solely on the provided excerpts.
   - If the info isn’t there, say: "The PDF does not contain this information."

3. **Response Style**:
   - Use simple, clear language.
   - Provide a direct answer followed by brief reasoning or context if necessary.

If the query is unclear, ask for clarification.
"""

Step 8: Generate Hypothetical Document

Use the language model to create a hypothetical answer.

def generate_hypothetical_document(query):
    hypothetical_prompt = f"Generate a short, hypothetical answer to the question: {query}"
    response = llm.invoke(hypothetical_prompt)
    hypothetical_doc = response.content.strip()
    return hypothetical_doc

Step 9: Embed Hypothetical Document

Convert the hypothetical document into an embedding.

def embed_hypothetical_document(hypothetical_doc, embedder):
    return embedder.embed_query(hypothetical_doc)

Step 10: Retrieve Similar Documents

Fetch PDF chunks similar to the hypothetical document’s embedding.

def retrieve_similar_documents(vector_store, hypothetical_embedding, k=5):
    return vector_store.similarity_search_by_vector(hypothetical_embedding, k=k)
  • k=5: Retrieves the top 5 most similar chunks.

Step 11: Construct the Prompt

Build a prompt combining the system instructions, context, and query.

def construct_prompt(query, context):
    prompt = (
        SYSTEM_PROMPT + "\n\n"
        "Based on the following PDF excerpts, answer the question.\n\n"
        "Excerpts:\n"
        f"{context}\n\n"
        "Question: " + query + "\n\n"
        "Assistant:"
    )
    return prompt

Step 12: Generate the Response with HyDE

Tie everything together to answer the query.

def chat_with_hyde(query, vector_store, llm, embedder):
    hypothetical_doc = generate_hypothetical_document(query)
    hypothetical_embedding = embed_hypothetical_document(hypothetical_doc, embedder)
    retrieved_docs = retrieve_similar_documents(vector_store, hypothetical_embedding)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    prompt = construct_prompt(query, context)
    response = llm.invoke(prompt)
    return response.content

Step 13: Create an Interactive Loop

Enable continuous querying.

print("Welcome to the PDF Query Assistant with HyDE!")
while True:
    query = input("Ask a question about the PDF (or type 'exit' to quit): ")
    if query.lower() == 'exit':
        print("Goodbye!")
        break
    if not query.strip():
        print("Please enter a valid question.")
        continue
    try:
        answer = chat_with_hyde(query, vector_store, llm, embedder)
        print("Assistant:", answer)
    except Exception as e:
        print(f"An error occurred: {e}")

Assistant Output:

Conclusion:

This interactive PDF query assistant effectively demonstrates the capabilities and advantages of Hypothetical Document Embeddings (HyDE) within Retrieval-Augmented Generation (RAG) systems. By creating a hypothetical answer to the user's query, embedding this hypothetical response, and then retrieving documents that are semantically similar, HyDE significantly enhances the accuracy and relevance of the responses provided. This approach is particularly beneficial for handling queries that do not directly match the text within the documents. By leveraging this innovative method, the assistant can deliver more precise and contextually appropriate answers, thereby improving the overall user experience when interacting with complex document collections.

0
Subscribe to my newsletter

Read articles from shrihari katti directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

shrihari katti
shrihari katti