Reciprocal Rank Fusion (RRF) in RAG: Complete Guide with Working Code

Ganesh GhadageGanesh Ghadage
7 min read

Before reading this article, I suggest you to go through the Query Translation in RAG: 5 Powerful Techniques to Improve Retrieval Accuracy article, In that article I have briefly discussed about the Reciprocal Rank Fusion (RRF) technique. While in this article we will go in depth of Reciprocal Rank Fusion and also we will write a code for it.

What is Reciprocal Rank Fusion (RRF)?

Reciprocal Rank Fusion is similar to Parallel Query Retrieval technique, Here also we generate the 4-5 queries based on user query and retrieve the documents based on them, But here instead of just filtering unique documents, we rank the documents based Ranking formula, and take the top n documents from them.

Ranking Formula:

Basically we rank the documents based on frequency and position of occurrence

How Reciprocal Rank Fusion works

Step 1: Generating multiple queries

When user asks as a query, we provide it to an small LLM ask it to generate multiple queries of it.

import os
from dotenv import load_dotenv

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import BaseOutputParser
from typing import List, Dict

load_dotenv()

class LineListOutputParser(BaseOutputParser[List[str]]):
  """Output parser for a list of lines."""

  def parse(self, text: str) -> List[str]:
    lines = text.strip().split("\n")
    return list(filter(None, lines)) 

output_parser = LineListOutputParser()

llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash",
  google_api_key=os.getenv("GEMINI_API_KEY"),
  temperature=0,
  max_tokens=None,
  timeout=None,
  max_retries=2,
)

QUERY_REWRITE_PROMPT = PromptTemplate(
  input_variables=["question"],
  template="""You are an AI language model assistant. Your task is to generate five 
  different versions of the given user question to retrieve relevant documents from a vector 
  database. By generating multiple perspectives on the user question, your goal is to help
  the user overcome some of the limitations of the distance-based similarity search. 
  Provide these alternative questions separated by newlines.
  Original question: {question}""",
)

llm_chain = QUERY_REWRITE_PROMPT | llm | output_parser

# ----- Step 1: Generate Multiple Queries -----
user_query = "What is FS Module?"
generated_queries = llm_chain.invoke(user_query)

print("--------- Generated Queries -----------")
for i, query in enumerate(generated_queries, 1):
  print(f"{i}. {query}")
print("------------------------")

Step 2: Parallely fetching relevant chunks from vector store

Once we generate multiple queries, we embed those query, and retrieve the relevant chunks from the vector store by performing similarity search. This action is done for all the generated query in parallel.

import os

from langchain_google_genai import ChatGoogleGenerativeAI
from typing import List, Set
import concurrent.futures
from langchain_core.documents import Document

from utils.output_parser import output_parser
from config.vector_store import get_vector_store
from llm.prompt_templates import QUERY_REWRITE_PROMPT
from utils.rank_docs import rank_documents

# ----- setup --------

qdrant = get_vector_store()

llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash",
  google_api_key=os.getenv("GEMINI_API_KEY"),
  temperature=0,
  max_tokens=None,
  timeout=None,
  max_retries=2,
)

llm_chain = QUERY_REWRITE_PROMPT | llm | output_parser

def reciprocal_rank_fusion(
  user_query: str, 
  llm_chain=llm_chain, 
  retriever=qdrant.as_retriever()
) -> List[Document]:
  """
  Generate multiple queries using llm_chain, fetch documents using retriever,
  and return deduplicated list of documents.

  Args:
    user_query (str): The input user query.
    llm_chain: A Runnable (PromptTemplate | LLM | OutputParser) for generating queries.
    retriever: A LangChain retriever (e.g., qdrant.as_retriever()).

  Returns:
    List[Document]: Deduplicated retrieved documents across all queries.
  """
  # Step 1: Generate multiple queries
  generated_queries = llm_chain.invoke(user_query)

  # Step 2: Fetch documents in parallel
  def fetch_docs(query: str):
    return retriever.invoke(query)

  all_docs: List[Document] = []
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(fetch_docs, generated_queries)
    for docs in results:
      all_docs.extend(docs)

  # Step 3: Rank docs    
  sorted_docs = rank_documents(all_docs)

  print("--------- sorted docs ------")
  print(f"\nTotal sorted documents: {len(sorted_docs)}")
  print("===============================")
  for i, doc in enumerate(sorted_docs, 1):
    print(f"Doc {i} ID: {doc}")
  print("===============================")

  return sorted_docs

Step 3: Ranking the documents

From all the fetched documents, we rank the documents and tak the only top n documents to answer the user query.

To rank the documents we consider their occurrence and position for each query.

suppose we have three queries and those 3 queries fetched documents as per below:

While ranking we rank them as per below

  • doc_id:3 is occuring 3 time so it should have 1st rank.

  • but doc_id:1 and doc_id:2 both are occuring twice, so while ranking them we consider their position of occurrence.

  • In Query 1, doc_id:1 is at 1st position and in Query 3, doc_id:1 is at 2nd position while, doc_id:2 is at 3rd position in Query 1 and at Query 2 it is at 1st position. So the doc_id:1 has more precedence than doc_id:2.

  • doc_id:4 has only one occurrence so it goes at last in ranking.

finally ranking becomes like below

from typing import List, Dict
from collections import defaultdict
from langchain_core.documents import Document

def reciprocal_rank_fusion(
  docs: List[Document],
  k: int = 60
) -> List[Document]:
  """
  Apply Reciprocal Rank Fusion (RRF) to combine multiple ranked lists.

  Args:
    docs List[Document]: list of documents.
    k (int): RRF constant. Defaults to 60.

  Returns:
    List[Document]: Final ranked list of documents with highest combined RRF scores.
  """
  doc_scores = defaultdict(float)
  doc_store = {}

  for rank, doc in enumerate(docs):
    doc_id = doc.metadata.get("id") or hash(doc.page_content.strip())
    score = 1 / (k + rank)
    doc_scores[doc_id] += score
    # Store one version of the doc
    if doc_id not in doc_store:
      doc_store[doc_id] = doc

  # Sort by RRF score (descending)
  ranked_doc_ids = sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)

  for doc_id, score in ranked_doc_ids:
    print(f"doc_id: {doc_id}, score : {score} ")

  # Return the corresponding Document objects
  return [doc_store[doc_id] for doc_id, _ in ranked_doc_ids]

Step 4: Generating output

Using ranked documents and users original query, we generate the output, which will more accurate.

import os
from openai import OpenAI

from retriever.retrival import reciprocal_rank_fusion

api_key = os.getenv("GEMINI_API_KEY")

client = OpenAI(
  api_key=api_key,
  base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

def llm_chat(query: str):
  unique_docs = reciprocal_rank_fusion(query)

  context = "\n\n"
  for doc in unique_docs:
    # print(doc)
    context += f"Page Content: {doc.page_content}\nPage Number: {doc.metadata['page_label']}\nFile Location: {doc.metadata['source']} \n\n"


  SYSTEM_PROMPT = f"""
    You are a helpful AI assistant who answers user query based on the available context retrieved from a PDF file along with page_contents and page number.

    You should only answer the user based on the following context and navigate the user to open the right page number to know more.
    answer should be in details.
    Context:
    {context}
  """

  chat_completion = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[
      {"role": "system", "content": SYSTEM_PROMPT},
      {"role": "user", "content": query}
    ],
  )

  return chat_completion.choices[0].message.content

Benefits of Reciprocal Rank Fusion

Even though if, multiple queries fetch the docs that are not entirely relevant to user query, by ranking we reject those documents and take only closely relevant documents to answer the user query.

Source Code

retrival.py

import os

from langchain_google_genai import ChatGoogleGenerativeAI
from typing import List, Set
import concurrent.futures
from langchain_core.documents import Document

from utils.output_parser import output_parser
from config.vector_store import get_vector_store
from llm.prompt_templates import QUERY_REWRITE_PROMPT
from utils.rank_docs import rank_documents

# ----- setup --------

qdrant = get_vector_store()

llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash",
  google_api_key=os.getenv("GEMINI_API_KEY"),
  temperature=0,
  max_tokens=None,
  timeout=None,
  max_retries=2,
)

llm_chain = QUERY_REWRITE_PROMPT | llm | output_parser

def reciprocal_rank_fusion(
  user_query: str, 
  llm_chain=llm_chain, 
  retriever=qdrant.as_retriever()
) -> List[Document]:
  """
  Generate multiple queries using llm_chain, fetch documents using retriever,
  and return deduplicated list of documents.

  Args:
    user_query (str): The input user query.
    llm_chain: A Runnable (PromptTemplate | LLM | OutputParser) for generating queries.
    retriever: A LangChain retriever (e.g., qdrant.as_retriever()).

  Returns:
    List[Document]: Deduplicated retrieved documents across all queries.
  """
  # Step 1: Generate multiple queries
  generated_queries = llm_chain.invoke(user_query)

  # Step 2: Fetch documents in parallel
  def fetch_docs(query: str):
    return retriever.invoke(query)

  all_docs: List[Document] = []
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(fetch_docs, generated_queries)
    for docs in results:
      all_docs.extend(docs)

  # Step 3: Rank docs    
  sorted_docs = rank_documents(all_docs)

  return sorted_docs

chat.py

import os
from openai import OpenAI

from retriever.retrival import reciprocal_rank_fusion

api_key = os.getenv("GEMINI_API_KEY")

client = OpenAI(
  api_key=api_key,
  base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

def llm_chat(query: str):
  search_results =  reciprocal_rank_fusion(query)

  context = "\n\n"
  for _ , doc in enumerate(search_results, 1):
    context += f"Page Content: {doc.page_content}\nPage Number: {doc.metadata['page_label']}\nFile Location: {doc.metadata['source']} \n\n"


  SYSTEM_PROMPT = f"""
    You are a helpful AI assistant who answers user query based on the available context retrieved from a PDF file along with page_contents and page number.

    You should only answer the user based on the following context and navigate the user to open the right page number to know more.
    answer should be in details.
    Context:
    {context}
  """

  chat_completion = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[
      {"role": "system", "content": SYSTEM_PROMPT},
      {"role": "user", "content": query}
    ],
  )

  return chat_completion.choices[0].message.content

main.py

import os
from dotenv import load_dotenv

from generator.chat import llm_chat

load_dotenv()

def main():
  query = "what is fs module?"
  result = llm_chat(query=query)

  print(result)

if __name__ == "__main__":
  main()

Full source code available at: GitHub

0
Subscribe to my newsletter

Read articles from Ganesh Ghadage directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ganesh Ghadage
Ganesh Ghadage