RAG with Reciprocal Rank Fusion

Shaim KhanusiyaShaim Khanusiya
5 min read

πŸ’‘ Sometimes, a single search just isn’t enough.
When users ask questions in different ways, RRF helps bring together the most relevant answers β€” even if the query is vague or phrased differently.


🧠 What is RRF (Reciprocal Rank Fusion)?

Reciprocal Rank Fusion (RRF) is a simple but smart way to merge multiple ranked lists and pick the best documents for your LLM context.

In plain English: Imagine asking the same question in different ways, collecting top answers for each, and then ranking all those answers fairly. That’s RRF.


❓Why Do We Even Need RRF?

Let’s say you ask:

β€œHow does React manage state?”

The model might not find the best answer directly. But if you also ask:

  • β€œHow to handle state in React?”

  • β€œReact state management explained?”

And merge the results from all these variations, you get a richer and more accurate context to feed into your LLM.


πŸ”§ How Does RRF Work?

Here’s the full architecture in one diagram:

πŸͺœ Step-by-Step Flow of RRF

StepDescription
1️⃣User enters a query
2️⃣Generate 3 query variations
3️⃣Retrieve top-k documents for each variation
4️⃣Score and rank all documents using RRF
5️⃣Sort documents based on score
6️⃣Provide sorted documents to LLM for final answer

πŸ§‘β€πŸ’» Code Walkthrough – Functions Explained

πŸ“„ Loads your PDF document for processing.

def load_pdf_documents(pdf_path):
    loader = PyPDFLoader(pdf_path)
    return loader.load()

βœ‚οΈ Splits large PDFs into smaller readable chunks.

def split_into_chunks(documents, chunk_size=2000, chunk_overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap
    )
    return splitter.split_documents(documents)
def get_embedder():
    return GoogleGenerativeAIEmbeddings(
        model="models/text-embedding-004",
        google_api_key=GOOGLE_API_KEY
    )
def store_chunks_in_qdrant(chunks, embedding_model):
    return QdrantVectorStore.from_documents(
        documents=chunks,
        embedding=embedding_model,
        url="http://localhost:6333",
        collection_name="pdf_chunks"
    )

πŸ’¬ Creates different phrasings of the original user query to help diversify search results.

def generate_query_variations(original_query, model, num_variations=3):
    prompt = f"Generate {num_variations} different ways to ask this question: {original_query}"
    response = model.invoke(prompt)
    variations = response.content.split("\n")
    return [original_query] + [v.strip() for v in variations if v.strip()]

πŸ“₯ Retrieves top documents for each query variation in parallel.

def retrieve_parallel_with_rrf(vector_store, queries, k=3):
    docs_per_query = []
    print("\nπŸ” Top-k documents for each variation:")
    for i, query in enumerate(queries):
        docs = vector_store.similarity_search(query, k=k)
        docs_per_query.append(docs)
        print(f"\nπŸ”Έ Variation {i+1}: {query}")

        #print the rank (just preview)
        for j, doc in enumerate(docs, start=1):
            preview = doc.page_content[:100].replace("\n", " ") + "..."
            #print(f"   Rank {j}: {preview}")

    return docs_per_query

πŸ“Š Applies Reciprocal Rank Fusion: assigns scores to docs based on their rank and merges them fairly.

def rank_the_queries(docs_per_query, k=60):
    scores = defaultdict(float)
    doc_map = {}
    source_info = defaultdict(list)

    for i, query_docs in enumerate(docs_per_query):
        for rank, doc in enumerate(query_docs, start=1):
            content = doc.page_content
            score = 1 / (k + rank)
            scores[content] += score
            doc_map[content] = doc
            source_info[content].append(f"Variation {i+1} (Rank {rank}, +{score:.4f})")

    sorted_docs = sorted(scores.items(), key=lambda x: x[1], reverse=True)

    print("\nπŸ“Š RRF Fused Rankings:")
    for i, (doc_text, score) in enumerate(sorted_docs, start=1):
        source = "; ".join(source_info[doc_text])
        #print the source (just preview)
        preview = doc_text[:100].replace("\n", " ") + "..."
        print(f"{i}. Score: {score:.4f} | Source: {source}\n   {preview}")

    unique_docs = [doc_map[doc_text] for doc_text, _ in sorted_docs]
    return unique_docs

πŸ€– Combines everything: query β†’ variations β†’ semantic search β†’ rank β†’ LLM prompt.

 def chat_with_rrf(query, vector_store, chat_model):
    queries = generate_query_variations(query, chat_model)
    print("\nπŸ” Generated Query Variations:")
    for idx, q in enumerate(queries, 1):
        print(f"{idx}. {q}")

    docs_per_query = retrieve_parallel_with_rrf(vector_store, queries)
    fused_docs = rank_the_queries(docs_per_query)

    context = "\n\n...\n\n".join([doc.page_content for doc in fused_docs[:5]])
    full_prompt = (
        SYSTEM_PROMPT +
        f"\n\nRelevant excerpts from the PDF:\n{context}\n\nUser's question: {query}\n\nAssistant:"
    )
    response = chat_model.invoke(full_prompt)
    return response.content

βœ… Advantages of RRF

AdvantageDescription
🎯 Better AccuracyMerges results from different query styles to get better context
πŸ”„ FlexibleWorks even if some variations return poor results
πŸ’‘ Simple ScoringEasy to implement without needing ML training

⚠️ Limitations of RRF

LimitationDescription
πŸ“„ DuplicatesSame document may appear in multiple variations unless filtered
🐒 Extra TimeAdds some latency due to multiple searches
πŸ“Š No Content AwarenessRRF ranks by position, not document meaning or novelty

πŸš€ Real-World Applications of RRF

ApplicationWhy RRF Fits
πŸ” Meta Search EnginesCombines results from multiple engines like Google + Bing
πŸ—ƒοΈ Multi-Database RetrievalFetch from different sources (e.g., HR + Finance + Legal) and merge
πŸ€– Agent QueryingWhen agents rephrase questions and vote on best answers
πŸ“š Hybrid SearchCombining keyword + semantic + cross-modal search results

πŸ“Œ Summary

If your RAG pipeline feels like it’s missing the mark, Reciprocal Rank Fusion (RRF) might be the fix.

It gives your LLM richer, diverse, and ranked context β€” all without retraining a thing.


πŸ“‚ GitHub Code

Check out the full working RRF example here:
πŸ‘‰ GitHub – RRF in GenAI


🀝 Connect With Me

If you have questions, ideas, or want to nerd out on RAG + RRF:
πŸ”— Connect on LinkedIn

0
Subscribe to my newsletter

Read articles from Shaim Khanusiya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shaim Khanusiya
Shaim Khanusiya