Enhancing RAG with Reciprocal Rank Fusion:

What is Reciprocal Rank Fusion (RRF)?
Reciprocal Rank Fusion (RRF) is a method used in information retrieval to combine rankings from multiple sources into a single, unified list. In the context of Retrieval-Augmented Generation (RAG), RRF is employed to merge document rankings retrieved from different query variations. RAG systems combine a retrieval step (fetching relevant documents) with a generation step (producing an answer using a language model). RRF enhances the retrieval process by ensuring the most relevant and diverse documents are selected, especially for complex or ambiguous user queries.
The core idea of RRF is to score documents based on their ranks across multiple queries, prioritizing those that consistently rank high. This makes it particularly valuable in RAG when the system generates multiple query phrasings to capture various aspects of a user's question.
How Does RRF Work?
RRF calculates a score for each document using the following formula:
RRF Score=∑i=1n1k+ranki\text{RRF Score} = \sum_{i=1}^{n} \frac{1}{k + \text{rank}_i}\text{RRF Score} = \sum_{i=1}^{n} \frac{1}{k + \text{rank}_i}
Where:
( n ): The number of queries or ranking sources.
ranki\text{rank}_i
\text{rank}_i
: The rank of the document in the ( i )-th query (e.g., 1 for the top result, 2 for the second, etc.).
( k ): A constant (typically set to 60) that prevents division by zero and smooths the scoring curve.
For example:
If a document is ranked 1st in one query and 3rd in another, with k=60k = 60
k = 60
:- Score = 160+1+160+3=161+163≈0.0323\frac{1}{60 + 1} + \frac{1}{60 + 3} = \frac{1}{61} + \frac{1}{63} \approx 0.0323
\frac{1}{60 + 1} + \frac{1}{60 + 3} = \frac{1}{61} + \frac{1}{63} \approx 0.0323
.
- Score = 160+1+160+3=161+163≈0.0323\frac{1}{60 + 1} + \frac{1}{60 + 3} = \frac{1}{61} + \frac{1}{63} \approx 0.0323
A document ranked 1st in both queries would score higher: 161+161≈0.0328\frac{1}{61} + \frac{1}{61} \approx 0.0328
\frac{1}{61} + \frac{1}{61} \approx 0.0328
.
Documents are then sorted by their total RRF scores, with higher scores indicating greater relevance across the queries.
Why Use RRF in RAG?
RAG systems often employ Parallel Query Fusion, where multiple versions of a user’s question are generated to retrieve a broader set of relevant documents. RRF is used to combine these results effectively for the following reasons:
Balancing Relevance: Documents that rank highly across multiple queries are prioritized, ensuring consistently relevant results.
Reducing Bias: It prevents over-reliance on a single query’s results, which might skew the outcome toward a narrow interpretation.
Improving Diversity: By considering rankings from various queries, RRF ensures a mix of perspectives, making answers more comprehensive.
This is especially useful for nuanced or multifaceted questions, where a single query might miss critical information. For instance, asking "What is Python used for?" could be rephrased as "What are Python’s applications?" or "How is Python utilized?"—RRF ensures documents relevant to all these angles are considered.
Pros and Cons of RRF
Pros:
Improved Accuracy: By leveraging multiple queries, RRF enhances the relevance of retrieved documents.
Flexibility: It can merge results from any number of queries or ranking systems.
Simple Implementation: The algorithm is straightforward and computationally lightweight beyond the initial ranking.
Cons:
Parameter Sensitivity: The constant ( k ) (e.g., 60) impacts scoring and may require tuning for optimal performance.
Computational Cost: Running multiple queries and calculating RRF scores can be resource-intensive, especially with large datasets.
Potential Redundancy: Without additional deduplication, overlapping documents might still appear in the final list.
System Architecture
PDF Query Assistant follows these steps:
Load and split a PDF into chunks.
Convert chunks into embeddings and store them in a vector database (Qdrant).
Generate variations of the user’s query.
Retrieve documents for each variation.
Fuse the retrieved documents using RRF.
Generate an answer using a language model (Google Gemini).
Provide an interactive interface for user queries.
Code Walkthrough: RRF Bot for PDF Querying
Here’s a step-by-step explanation of the Python code that powers the RRF Bot.
1. Import Necessary Libraries
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore
import os
2. Load and Process the PDF
pdf_path = Path(__file__).parent / "Python Programming.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()
3. Split the Document into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents=docs)
4. Set Up Embeddings and Vector Store
embedder = GoogleGenerativeAIEmbeddings(
model="models/text-embedding-004",
google_api_key="YOUR_API_KEY"
)
vector_store = QdrantVectorStore.from_documents(
documents=split_docs,
embedding=embedder,
url="http://localhost:6333",
collection_name="pdf_assistant"
)
5. Initialize the Language Model
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key="YOUR_API_KEY"
)
6. Define the System Prompt
SYSTEM_PROMPT = (
"You are a helpful assistant answering questions based on a PDF document. "
"Use the provided excerpts to give accurate and concise answers."
)
7. Generate Query Variations
def generate_query_variations(original_query, num_variations=3):
prompt = f"Generate {num_variations} different ways to ask this question: {original_query}"
response = llm.invoke(prompt)
variations = response.content.split("\n")
return [original_query] + [v.strip() for v in variations if v.strip()]
8. Retrieve Documents for Each Query
def retrieve_parallel_with_rrf(vector_store, queries, k=3):
docs_per_query = []
for query in queries:
docs = vector_store.similarity_search(query, k=k)
docs_per_query.append(docs)
return docs_per_query
9. Fuse Results Using RRF
def fuse_results_with_rrf(docs_per_query, k=60):
from collections import defaultdict
scores = defaultdict(float)
for query_docs in docs_per_query:
for rank, doc in enumerate(query_docs, start=1):
scores[doc.page_content] += 1 / (k + rank)
sorted_docs = sorted(scores.items(), key=lambda x: x[1], reverse=True)
unique_docs = [doc for doc, _ in sorted_docs]
return unique_docs
Uses Reciprocal Rank Fusion (RRF) to combine document rankings from all queries.
Assigns scores to each document based on its rank (1 / (k + rank)), aggregates scores, sorts by total score, and returns a ranked list of unique document contents.
10. Chat Function with RRF
def chat_with_rrf(query, vector_store, llm):
queries = generate_query_variations(query)
print(f"Generated queries: {queries}")
docs_per_query = retrieve_parallel_with_rrf(vector_store, queries)
fused_docs = fuse_results_with_rrf(docs_per_query)
context = "\n\n...\n\n".join(fused_docs[:5])
full_prompt = (
SYSTEM_PROMPT + "\n\nRelevant excerpts from the PDF:\n" +
context + "\n\nUser's question: " + query + "\n\nAssistant:"
)
response = llm.invoke(full_prompt)
return response.content
Generates query variations, retrieves documents, fuses them with RRF, and constructs a prompt with the top 5 documents as context.
Passes the prompt to the language model to generate an answer.
11. Interactive Loop
print("Welcome to the PDF Query Assistant with RRF!")
while True:
query = input("Ask a question about the PDF (or type 'exit' to quit): ")
if query.lower() == 'exit':
print("Goodbye!")
break
if not query.strip():
print("Please enter a valid question.")
continue
try:
answer = chat_with_rrf(query, vector_store, llm)
print("Assistant:", answer)
except Exception as e:
print(f"An error occurred: {e}")
Output:
Conclusion:
By integrating Reciprocal Rank Fusion (RRF) into a Retrieval-Augmented Generation (RAG) system, we have developed a highly efficient PDF query assistant. This assistant is exceptional at retrieving and synthesizing relevant information from documents. The use of advanced tools such as LangChain, Qdrant, and Google's Generative AI enhances the system's capabilities, making this approach not only powerful but also accessible to a wide range of users. Whether you are a student looking to extract key information for your studies, a professional seeking insights for work projects, or simply someone curious to explore the depths of your PDF collection, adapting this code to your own documents can provide significant advantages. Experience the benefits of improved retrieval and synthesis firsthand by trying out this innovative solution!
Subscribe to my newsletter
Read articles from shrihari katti directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
