RAG with Reciprocal Rank Fusion

Table of contents
- π§ What is RRF (Reciprocal Rank Fusion)?
- βWhy Do We Even Need RRF?
- π§ How Does RRF Work?
- πͺ Step-by-Step Flow of RRF
- π§βπ» Code Walkthrough β Functions Explained
- π Loads your PDF document for processing.
- βοΈ Splits large PDFs into smaller readable chunks.
- π§ Generates embeddings and stores them in Qdrant for semantic search.
- π¬ Creates different phrasings of the original user query to help diversify search results.
- π₯ Retrieves top documents for each query variation in parallel.
- π Applies Reciprocal Rank Fusion: assigns scores to docs based on their rank and merges them fairly.
- π€ Combines everything: query β variations β semantic search β rank β LLM prompt.
- β Advantages of RRF
- β οΈ Limitations of RRF
- π Real-World Applications of RRF
- π Summary
- π GitHub Code
- π€ Connect With Me
π‘ Sometimes, a single search just isnβt enough.
When users ask questions in different ways, RRF helps bring together the most relevant answers β even if the query is vague or phrased differently.
π§ What is RRF (Reciprocal Rank Fusion)?
Reciprocal Rank Fusion (RRF) is a simple but smart way to merge multiple ranked lists and pick the best documents for your LLM context.
In plain English: Imagine asking the same question in different ways, collecting top answers for each, and then ranking all those answers fairly. Thatβs RRF.
βWhy Do We Even Need RRF?
Letβs say you ask:
βHow does React manage state?β
The model might not find the best answer directly. But if you also ask:
βHow to handle state in React?β
βReact state management explained?β
And merge the results from all these variations, you get a richer and more accurate context to feed into your LLM.
π§ How Does RRF Work?
Hereβs the full architecture in one diagram:
πͺ Step-by-Step Flow of RRF
Step | Description |
1οΈβ£ | User enters a query |
2οΈβ£ | Generate 3 query variations |
3οΈβ£ | Retrieve top-k documents for each variation |
4οΈβ£ | Score and rank all documents using RRF |
5οΈβ£ | Sort documents based on score |
6οΈβ£ | Provide sorted documents to LLM for final answer |
π§βπ» Code Walkthrough β Functions Explained
π Loads your PDF document for processing.
def load_pdf_documents(pdf_path):
loader = PyPDFLoader(pdf_path)
return loader.load()
βοΈ Splits large PDFs into smaller readable chunks.
def split_into_chunks(documents, chunk_size=2000, chunk_overlap=200):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
return splitter.split_documents(documents)
π§ Generates embeddings and stores them in Qdrant for semantic search.
def get_embedder():
return GoogleGenerativeAIEmbeddings(
model="models/text-embedding-004",
google_api_key=GOOGLE_API_KEY
)
def store_chunks_in_qdrant(chunks, embedding_model):
return QdrantVectorStore.from_documents(
documents=chunks,
embedding=embedding_model,
url="http://localhost:6333",
collection_name="pdf_chunks"
)
π¬ Creates different phrasings of the original user query to help diversify search results.
def generate_query_variations(original_query, model, num_variations=3):
prompt = f"Generate {num_variations} different ways to ask this question: {original_query}"
response = model.invoke(prompt)
variations = response.content.split("\n")
return [original_query] + [v.strip() for v in variations if v.strip()]
π₯ Retrieves top documents for each query variation in parallel.
def retrieve_parallel_with_rrf(vector_store, queries, k=3):
docs_per_query = []
print("\nπ Top-k documents for each variation:")
for i, query in enumerate(queries):
docs = vector_store.similarity_search(query, k=k)
docs_per_query.append(docs)
print(f"\nπΈ Variation {i+1}: {query}")
#print the rank (just preview)
for j, doc in enumerate(docs, start=1):
preview = doc.page_content[:100].replace("\n", " ") + "..."
#print(f" Rank {j}: {preview}")
return docs_per_query
π Applies Reciprocal Rank Fusion: assigns scores to docs based on their rank and merges them fairly.
def rank_the_queries(docs_per_query, k=60):
scores = defaultdict(float)
doc_map = {}
source_info = defaultdict(list)
for i, query_docs in enumerate(docs_per_query):
for rank, doc in enumerate(query_docs, start=1):
content = doc.page_content
score = 1 / (k + rank)
scores[content] += score
doc_map[content] = doc
source_info[content].append(f"Variation {i+1} (Rank {rank}, +{score:.4f})")
sorted_docs = sorted(scores.items(), key=lambda x: x[1], reverse=True)
print("\nπ RRF Fused Rankings:")
for i, (doc_text, score) in enumerate(sorted_docs, start=1):
source = "; ".join(source_info[doc_text])
#print the source (just preview)
preview = doc_text[:100].replace("\n", " ") + "..."
print(f"{i}. Score: {score:.4f} | Source: {source}\n {preview}")
unique_docs = [doc_map[doc_text] for doc_text, _ in sorted_docs]
return unique_docs
π€ Combines everything: query β variations β semantic search β rank β LLM prompt.
def chat_with_rrf(query, vector_store, chat_model):
queries = generate_query_variations(query, chat_model)
print("\nπ Generated Query Variations:")
for idx, q in enumerate(queries, 1):
print(f"{idx}. {q}")
docs_per_query = retrieve_parallel_with_rrf(vector_store, queries)
fused_docs = rank_the_queries(docs_per_query)
context = "\n\n...\n\n".join([doc.page_content for doc in fused_docs[:5]])
full_prompt = (
SYSTEM_PROMPT +
f"\n\nRelevant excerpts from the PDF:\n{context}\n\nUser's question: {query}\n\nAssistant:"
)
response = chat_model.invoke(full_prompt)
return response.content
β Advantages of RRF
Advantage | Description |
π― Better Accuracy | Merges results from different query styles to get better context |
π Flexible | Works even if some variations return poor results |
π‘ Simple Scoring | Easy to implement without needing ML training |
β οΈ Limitations of RRF
Limitation | Description |
π Duplicates | Same document may appear in multiple variations unless filtered |
π’ Extra Time | Adds some latency due to multiple searches |
π No Content Awareness | RRF ranks by position, not document meaning or novelty |
π Real-World Applications of RRF
Application | Why RRF Fits |
π Meta Search Engines | Combines results from multiple engines like Google + Bing |
ποΈ Multi-Database Retrieval | Fetch from different sources (e.g., HR + Finance + Legal) and merge |
π€ Agent Querying | When agents rephrase questions and vote on best answers |
π Hybrid Search | Combining keyword + semantic + cross-modal search results |
π Summary
If your RAG pipeline feels like itβs missing the mark, Reciprocal Rank Fusion (RRF) might be the fix.
It gives your LLM richer, diverse, and ranked context β all without retraining a thing.
π GitHub Code
Check out the full working RRF example here:
π GitHub β RRF in GenAI
π€ Connect With Me
If you have questions, ideas, or want to nerd out on RAG + RRF:
π Connect on LinkedIn
Subscribe to my newsletter
Read articles from Shaim Khanusiya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
