Learn the implementation of Hypothetical Document Embeddings (HyDE) in RAG application

gautam kumargautam kumar
4 min read

Introduction

HyDE means instead of searching based only on your question, the AI first imagines a fake answer, then looks for real documents that are similar to that fake answer. This helps find better results even if your question doesn't match the documents exactly.

"HyDE: When you guess so well, people think you studied.”

— hitesh choudhary

Architecture

Let’s understand this using an example

Imagine this situation: You go to Google and search: "Best laptop for students under 50000"

But what if Google can't find any good matches. Maybe the documents don't have the exact words you're using.

HyDE solves this problem!

Instead of searching right away, it first imagines what a perfect document answering your question would look like.

  • It generates a fake, hypothetical answer.

  • Then it converts this fake answer into a vector (embedding).

  • Finally, it searches real documents that are similar to this fake answer!

HyDE means in Layman terms: First create a fake answer, then search for real documents close to the fake answer.

HyDE with RAG application

Let's say you ask: "What are some eco-friendly phone brands?"

  • Without HyDE: It tries to match words like "eco-friendly", "phone", "brand" in real documents using the vector embeddings

  • With HyDE: It first imagines a fake answer like:

    "Brands like chaiPhone and coffeePhone focus on sustainable, repairable smartphones."

    Then it searches for documents that match this idea — even if the words are different!

  • HyDE RAG Flow:

    1. User asks a question (Example: "Best eco-friendly phones?")

    2. LLM (like GPT) generates a hypothetical document answering it.

    3. Embed this fake document into a vector.

    4. Search the real vector database for chunks similar to the fake answer.

    5. Retrieve top matching real chunks.

    6. Answer the user based on real retrieved data.

Code

Before you run your program, make sure to install all the below dependency and create the virtual environment. I am using “uv” to create the virtual environment, for more details checkout the video

https://www.youtube.com/watch?v=8mk85fyzevc

To setup Quadrant DB, use the local setup guide, follow the below resources

https://qdrant.tech/documentation/quickstart/

https://www.youtube.com/watch?v=mHrwS6ZoNKc

Algorithm

Step 1 => Load a PDF
Step 2 => Split into chunks
Step 3 => Embed chunks into Quadrant vector DB
Step 4 => Generate a fake "hypothetical" answer for the user query
Step 5 => Embed this hypothetical answer
Step 6 => Search real chunks similar to the fake answer.

HyDE implementation

# IMPORT LIBRARIES
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Quadrant
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.schema import Document
import os
from dotenv import load_dotenv
from typing import List

# LOAD ENV VARIABLES
load_dotenv()

# SET API KEYS
openai_api_key = os.getenv("OPENAI_API_KEY")

# INITIALIZE EMBEDDING MODEL
embedding_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# INITIALIZE LLM (FOR HYPOTHETICAL DOCUMENT GENERATION)
llm = OpenAI(
    temperature=0,
    openai_api_key=os.environ["OPENAI_API_KEY"]
)

# -------------------------------
# STEP 1: LOAD PDF
# -------------------------------

def load_documents(pdf_path: str) -> List[Document]:
    loader = PyPDFLoader(pdf_path)
    return loader.load()

# LOAD THE PDF
pdf_path = "eccomerce_products.pdf"
docs = load_documents(pdf_path)
print(f"Loaded {len(docs)} documents from PDF.")

# -------------------------------
# STEP 2: SPLIT INTO SMALLER CHUNKS
# -------------------------------

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
print(f"Split into {len(chunks)} chunks.")

# -------------------------------
# STEP 3: EMBED CHUNKS AND STORE IN VECTOR DB
# -------------------------------

vector_db = Quadrant.from_documents(chunks, embedding=embedding_model)
print("Vectorstore created and populated.")

# -------------------------------
# STEP 4: HYPOTHETICAL DOCUMENT EMBEDDING (HyDE)
# -------------------------------

def generate_hypothetical_answer(query: str) -> str:
    # ASK LLM TO IMAGINE A FAKE ANSWER BASED ON THE QUERY
    prompt = f"Write a detailed answer for: '{query}'"
    hypothetical_answer = llm.predict(prompt)
    return hypothetical_answer

# USER QUERY
user_query = "best smartphones under 20000 with good camera quality"

# GENERATE HYPOTHETICAL DOCUMENT
hypothetical_doc = generate_hypothetical_answer(user_query)
print("\n--- Hypothetical Answer Generated ---")
print(hypothetical_doc)

# EMBED THE HYPOTHETICAL DOCUMENT
hypothetical_doc_embedding = embedding_model.embed_query(hypothetical_doc)

# -------------------------------
# STEP 5: SEARCH REAL DOCUMENTS BASED ON HYPOTHETICAL EMBEDDING
# -------------------------------

# SEARCH VECTOR DB USING THE HYPOTHETICAL EMBEDDING
retrieved_docs = vector_db.similarity_search_by_vector(hypothetical_doc_embedding, k=5)

print("\n--- Retrieved Real Documents ---")
for idx, doc in enumerate(retrieved_docs):
    print(f"{idx+1}. {doc.page_content[:200]}...")

Output

Full working code on Github:

https://github.com/gautamkmahato

Conclusion

Hypothetical Document Embeddings (HyDE) offer a groundbreaking approach to contextual understanding in NLP by leveraging the power of hypothetical reasoning to enhance document representation. This novel technique can significantly improve performance in tasks like document retrieval, question answering, and summarization.

3
Subscribe to my newsletter

Read articles from gautam kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

gautam kumar
gautam kumar