Enhancing RAG Retrieval with HyDE


Introduction
Most RAG applications convert the user's query into a vector and search for similar patterns in the vector database.
But what if the user's query is vague or worded differently than the actual document?
In such cases, the application might generate results that are slightly irrelevant to the user's intent.
Problem Example
You have a RAG application that answers user queries based on a basic Python documentation.
Here’s a chunk of information stored in your vector database:
"The
zip()
function in Python allows you to iterate over multiple iterables (like lists) in parallel."
Now a user comes in and asks:
"How to loop over two lists together?"
If we compare this user query with the chunk in the database, we can clearly say that the answer is present.
But here's the problem — the text embedding of the user's query doesn’t match well with the document chunk.
This mismatch happens because the query doesn’t include the keyword "zip()"
, which is present in the document.
Here's How HyDE Fixes It
The user asks:
"How to loop over two lists together?"
The RAG system (with HyDE) asks the LLM:
"Write an answer to this Python programming question: How to loop over two lists together."
The LLM generates a hypothetical answer:
"You can use the
zip()
function to loop over two lists in parallel in Python."
This generated answer includes the keyword
zip()
which is present in the database.Instead of embedding the original user query, HyDE embeds this hypothetical answer.
Now the vector store easily matches it with:
"The
zip()
function in Python allows you to iterate over multiple iterables (like lists) in parallel."
🪜 Steps and Code
Load and Split the Document
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = PyPDFLoader(file_path="python.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
split_docs = docs # or splitter.split_documents(docs)
Embed and Store in Qdrant
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
embedder = OpenAIEmbeddings(model="text-embedding-3-small")
qdrant_client = QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore.from_documents(
documents=[],
url="http://localhost:6333",
collection_name="learning_langchain-hyde",
embedding=embedder,
force_recreate=True
)
vector_store.add_documents(split_docs)
Initial Search With Raw Query
retriever = QdrantVectorStore.from_existing_collection(
url="http://localhost:6333",
collection_name="learning_langchain-hyde",
embedding=embedder
)
results = retriever.similarity_search_with_score("How to loop over two lists together?", k=3)
THRESHOLD = 0.7
filtered = [(doc, score) for doc, score in results if score >= THRESHOLD]
if filtered:
for doc, score in filtered:
print(f"Score: {score:.2f} → {doc.page_content}")
else:
print("❌ No relevant data found for this query.")
HyDE Step – Generate Hypothetical Answer
from openai import OpenAI
client = OpenAI()
system_prompt = """
You are an AI Assistant who can take Python queries and answer them correctly and concisely.
"""
response = client.chat.completions.create(
model="gpt-4",
messages=[
{ "role": "system", "content": system_prompt },
{ "role": "user", "content": "How to loop over two lists together?" }
]
)
hypothetical_answer = response.choices[0].message.content
print("🤖 HyDE-generated:", hypothetical_answer)
Final Search Using Hypothetical Answer
results = retriever.similarity_search_with_score(hypothetical_answer, k=3)
filtered = [(doc, score) for doc, score in results if score >= THRESHOLD]
if filtered:
for doc, score in filtered:
print(f"Score: {score:.2f} → {doc.page_content}")
else:
print("❌ No relevant data found even after HyDE.")
asdas
🧠 Final Thoughts
Traditional RAG systems are powerful, but they often fail when user queries are vague, incomplete, or use different phrasing than the source documents.
HyDE (Hypothetical Document Embeddings) bridges that gap by letting an LLM "guess" a likely answer, then using that richer semantic context to perform retrieval.
Thanks for reading! 🙌
If you found this helpful, share it with your team or drop a ⭐️ on the GitHub repo (if you open source your code).
Let’s keep building smarter, more human-friendly AI systems!
Code Links.
Subscribe to my newsletter
Read articles from HiddenAIgent directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
