Stop sending bad context to your LLM — learn how RRF fixes your RAG pi

💡 Sometimes, a single search just isn’t enough.
When users ask questions in different ways, RRF helps bring together the most relevant answers — even if the query is vague or phrased differently.

🧠 What is RRF (Reciprocal Rank Fusion)?

Reciprocal Rank Fusion (RRF) is a simple but smart way to merge multiple ranked lists and pick the best documents for your LLM context.

In plain English: Imagine asking the same question in different ways, collecting top answers for each, and then ranking all those answers fairly. That’s RRF.

❓Why Do We Even Need RRF?

Let’s say you ask:

“How does React manage state?”

The model might not find the best answer directly. But if you also ask:

“How to handle state in React?”
“React state management explained?”

And merge the results from all these variations, you get a richer and more accurate context to feed into your LLM.

🔧 How Does RRF Work?

Here’s the full architecture in one diagram:

🪜 Step-by-Step Flow of RRF

Step	Description
1️⃣	User enters a query
2️⃣	Generate 3 query variations
3️⃣	Retrieve top-k documents for each variation
4️⃣	Score and rank all documents using RRF
5️⃣	Sort documents based on score
6️⃣	Provide sorted documents to LLM for final answer

🧑‍💻 Code Walkthrough – Functions Explained

📄 Loads your PDF document for processing.

def load_pdf_documents(pdf_path):
    loader = PyPDFLoader(pdf_path)
    return loader.load()

✂️ Splits large PDFs into smaller readable chunks.

def split_into_chunks(documents, chunk_size=2000, chunk_overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap
    )
    return splitter.split_documents(documents)

🧠 Generates embeddings and stores them in Qdrant for semantic search.

def get_embedder():
    return GoogleGenerativeAIEmbeddings(
        model="models/text-embedding-004",
        google_api_key=GOOGLE_API_KEY
    )

def store_chunks_in_qdrant(chunks, embedding_model):
    return QdrantVectorStore.from_documents(
        documents=chunks,
        embedding=embedding_model,
        url="http://localhost:6333",
        collection_name="pdf_chunks"
    )

💬 Creates different phrasings of the original user query to help diversify search results.

def generate_query_variations(original_query, model, num_variations=3):
    prompt = f"Generate {num_variations} different ways to ask this question: {original_query}"
    response = model.invoke(prompt)
    variations = response.content.split("\n")
    return [original_query] + [v.strip() for v in variations if v.strip()]

📥 Retrieves top documents for each query variation in parallel.

def retrieve_parallel_with_rrf(vector_store, queries, k=3):
    docs_per_query = []
    print("\n🔍 Top-k documents for each variation:")
    for i, query in enumerate(queries):
        docs = vector_store.similarity_search(query, k=k)
        docs_per_query.append(docs)
        print(f"\n🔸 Variation {i+1}: {query}")

        #print the rank (just preview)
        for j, doc in enumerate(docs, start=1):
            preview = doc.page_content[:100].replace("\n", " ") + "..."
            #print(f"   Rank {j}: {preview}")

    return docs_per_query

📊 Applies Reciprocal Rank Fusion: assigns scores to docs based on their rank and merges them fairly.

def rank_the_queries(docs_per_query, k=60):
    scores = defaultdict(float)
    doc_map = {}
    source_info = defaultdict(list)

    for i, query_docs in enumerate(docs_per_query):
        for rank, doc in enumerate(query_docs, start=1):
            content = doc.page_content
            score = 1 / (k + rank)
            scores[content] += score
            doc_map[content] = doc
            source_info[content].append(f"Variation {i+1} (Rank {rank}, +{score:.4f})")

    sorted_docs = sorted(scores.items(), key=lambda x: x[1], reverse=True)

    print("\n📊 RRF Fused Rankings:")
    for i, (doc_text, score) in enumerate(sorted_docs, start=1):
        source = "; ".join(source_info[doc_text])
        #print the source (just preview)
        preview = doc_text[:100].replace("\n", " ") + "..."
        print(f"{i}. Score: {score:.4f} | Source: {source}\n   {preview}")

    unique_docs = [doc_map[doc_text] for doc_text, _ in sorted_docs]
    return unique_docs

🤖 Combines everything: query → variations → semantic search → rank → LLM prompt.

 def chat_with_rrf(query, vector_store, chat_model):
    queries = generate_query_variations(query, chat_model)
    print("\n🔁 Generated Query Variations:")
    for idx, q in enumerate(queries, 1):
        print(f"{idx}. {q}")

    docs_per_query = retrieve_parallel_with_rrf(vector_store, queries)
    fused_docs = rank_the_queries(docs_per_query)

    context = "\n\n...\n\n".join([doc.page_content for doc in fused_docs[:5]])
    full_prompt = (
        SYSTEM_PROMPT +
        f"\n\nRelevant excerpts from the PDF:\n{context}\n\nUser's question: {query}\n\nAssistant:"
    )
    response = chat_model.invoke(full_prompt)
    return response.content

✅ Advantages of RRF

Advantage	Description
🎯 Better Accuracy	Merges results from different query styles to get better context
🔄 Flexible	Works even if some variations return poor results
💡 Simple Scoring	Easy to implement without needing ML training

⚠️ Limitations of RRF

Limitation	Description
📄 Duplicates	Same document may appear in multiple variations unless filtered
🐢 Extra Time	Adds some latency due to multiple searches
📊 No Content Awareness	RRF ranks by position, not document meaning or novelty

🚀 Real-World Applications of RRF

Application	Why RRF Fits
🔍 Meta Search Engines	Combines results from multiple engines like Google + Bing
🗃️ Multi-Database Retrieval	Fetch from different sources (e.g., HR + Finance + Legal) and merge
🤖 Agent Querying	When agents rephrase questions and vote on best answers
📚 Hybrid Search	Combining keyword + semantic + cross-modal search results

📌 Summary

If your RAG pipeline feels like it’s missing the mark, Reciprocal Rank Fusion (RRF) might be the fix.

It gives your LLM richer, diverse, and ranked context — all without retraining a thing.

📂 GitHub Code

Check out the full working RRF example here:
👉 GitHub – RRF in GenAI

🤝 Connect With Me

If you have questions, ideas, or want to nerd out on RAG + RRF:
🔗 Connect on LinkedIn

RAG with Reciprocal Rank Fusion

Table of contents