Overview

After seeing the Parallel query Retrieval we see that we might end up with multiple documents as result of semantic search. Reciprocal Rank Fusion (RRF) provides a way to combine these lists that as rank label attached to it.

Problem

Here also the problem statement remains same as we saw in Parallel Query Retrieval - the user query may contain ambiguity. If the query is poorly asked then the semantic search between the user query and documents will also be not accurate. The only difference in Parallel query Retrieval and Reciprocal Rank Fusion is that in RRF rank are being attached along with the document. The documents which are more frequent are given more weightage.

Example

User Query: What is the best mystery movie?

👩‍💻 AI expands this into parallel sub-queries:

Question 1: Top movies in 2025 - Result Set 1

Question 2: Award-winning movies - Result Set 2

Result Set 1 = ["The Life List", "Alpha", "The Croods"]

Result Set 2 = ["Alpha", "Despicable Me 4", "The Life List"]

From the above we can interpret that

Alpha appeared in both lists, with good ranks ⇒ higher score.
The Life List also appeared twice but at slightly lower ranks.
The Croods and Despicable Me 4 appeared only once, so their scores are lower.

Let’s Code

Load the Data Source, Chunking and store it in Vector Database

pdf_path = Path(__file__).parent / "file_path.pdf"

# Load the document from the PDF file
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

# Split the document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents=docs)

embeddings = OpenAIEmbeddings(
                model="text-embedding-3-small",
                api_key=OPENAI_API_KEY,
            )

# Create a new vector store - if collection doesn't already exist
vector_store = QdrantVectorStore.from_documents(
  documents=[],
  url="http://localhost:6333",
  collection_name="rrf", # Name of your collection in Qdrant
  embedding=embeddings
)

# Add the documents to the vector store
vector_store.add_documents(split_docs)

SubQuery Generation

retriever = QdrantVectorStore.from_existing_collection(
  url="http://localhost:6333",
  collection_name="rrf", 
  embedding=embeddings
)

user_query = "What is the best mystery movie?" 

# System prompt for breaking down the user's query into sub-queries
system_prompt_for_subqueries = """
You are a helpful AI Assistant. 
Your task is to take the user query and break it down into different sub-queries.

Rule:
Minimum Sub Query Length :- 3
Maximum Sub Query Length :- 5

Example:
Query: What is the best mystery movie?
Output: [
    "Top movies in 2025",
    "Award-winning movies",
    "High Rated mystery movies?",
    "Award-winning mystery movies?",
]
"""

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": system_prompt_for_subqueries},
        {"role": "user", "content": f"Query: {user_query}"}
    ]
)

sub_queries = ast.literal_eval(response.choices[0].message.content.strip())
print("Sub Queries:", sub_queries)

Get Relevant Chunks

# Function to retrieve relevant document chunks for each sub-query
def retrieve_chunks(query):
    return retriever.similarity_search(query=query)

# Use ThreadPoolExecutor to perform parallel retrieval of chunks for each sub-query
with ThreadPoolExecutor() as executor:
    all_chunks = list(executor.map(retrieve_chunks, sub_queries))

# Helper to generate a unique ID for each chunk (or you can use doc.metadata['id'] if available)
def get_doc_id(doc):
    return doc.page_content.strip()[:50]  # Use first 50 characters as an ID

# Create rankings (lists of doc_ids per sub-query result)
rankings = []
for result in all_chunks:
    rankings.append([get_doc_id(doc) for doc in result])

# Reciprocal Rank Fusion
def reciprocal_rank_fusion(rankings, k=60):
    scores = {}
    for ranking in rankings:
        for rank, doc_id in enumerate(ranking):
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    sorted_docs = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [doc_id for doc_id, _ in sorted_docs]

# Get final ranked doc IDs
final_doc_ids = reciprocal_rank_fusion(rankings)

# Map doc IDs to actual chunks
doc_map = {get_doc_id(doc): doc for doc in chain.from_iterable(all_chunks)}
ranked_chunks = [doc_map[doc_id] for doc_id in final_doc_ids if doc_id in doc_map]

Generation

final_system_prompt = f"""
You are a helpful assistant who answers the user's query using the following pieces of context.
If you don't know the answer, just say you don't know — don't make up an answer.

Context:
{[doc.page_content for doc in ranked_chunks]}
"""

# Final call to OpenAI using top-ranked documents
final_response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": final_system_prompt},
        {"role": "user", "content": user_query}
    ]
)

# Output the final answer
print("\nFinal Answer:\n")
print(final_response.choices[0].message.content)

Let’s Connect

LinkedIn: linkedin.com/in/revathi-p-22b060208

Twitter: x.com/RevathiP04

Query Translation: Reciprocal Rank Fusion

Table of contents

Overview

Problem

Example

Let’s Code

Load the Data Source, Chunking and store it in Vector Database

SubQuery Generation

Get Relevant Chunks

Generation

Let’s Connect

Subscribe to my newsletter

Revathi P

Revathi P