In our previous blog, we explored Parallel Query Retrieval — a technique that uses multiple rephrased queries to capture different versions of the same intent.

In this one, we’ll cover another powerful optimization method: Reciprocal Rank Fusion (RRF). It's similar in spirit to Parallel Query Retrieval but introduces ranking logic into the mix to refine which answers are the most relevant.

Let’s Understand This With a Simple Example

Imagine you're using different sources — maybe asking your friends, checking YouTube, or reading textbooks — to learn a concept.

You get five different answers. Now, you rank them based on which ones helped you understand better. Finally, you choose the best-ranked parts and combine them into a final summary in your notebook.

That’s Reciprocal Rank Fusion — in plain English.

We take multiple responses, ask the model to rank them by relevance, and then merge the top-ranked content to form a complete, high-quality answer.

How It Works

Let’s break it down step-by-step:

Start with the original user query.
Generate 4–5 alternative versions of that query (like in Parallel Query Retrieval).
For each version, retrieve a response from the model.
Then, ask the model to rank each response based on how relevant it is to the original question.
Finally, combine the top-ranked responses into one final answer.

This approach helps prioritize answers that align closely with the user’s intent, while still benefiting from the variety in phrasing.

Code Example: Using Gemini (Google Generative AI)

Here’s how you can implement Reciprocal Rank Fusion using google-generativeai.

Make sure you have your Google AI Studio API key.

Install the SDK

pip install -q -U google-generativeai

Python Code

from google import genai

# ✅ Step 0: Initialize Gemini client
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

# ✅ Step 1: Original user query
original_query = "What are the benefits of using graph databases in AI applications?"

# ✅ Step 2: Generate parallel queries
def generate_parallel_queries(client, query, n=5):
    prompt = f"Generate {n} semantically similar versions of this question:\n\n'{query}'"
    response = client.models.generate_content(
        model="gemini-2.0-pro",
        contents=prompt
    )
    text = response.text.strip()
    lines = [line.strip("-• ").strip() for line in text.split("\n") if line.strip()]
    return lines[:n]

# ✅ Step 3: Get responses for each variation
def get_responses(client, queries):
    responses = []
    for q in queries:
        res = client.models.generate_content(
            model="gemini-2.0-pro",
            contents=q
        )
        responses.append({"query": q, "answer": res.text.strip()})
    return responses

# ✅ Step 4: Ask the model to rank the answers
def rank_responses(client, original_query, responses):
    prompt = f"""You are given a user query and multiple answers generated from its variations.
Rank these answers from most relevant to least relevant for the original query: '{original_query}'.

Return the ranking in numbered list form.

"""
    for i, r in enumerate(responses, start=1):
        prompt += f"Answer {i}:\n{r['answer']}\n\n"

    ranking_response = client.models.generate_content(
        model="gemini-2.0-pro",
        contents=prompt
    )

    ranked_order = []
    lines = ranking_response.text.strip().split("\n")
    for line in lines:
        if line.strip() and line[0].isdigit():
            idx = int(line.strip().split(".")[0]) - 1
            if 0 <= idx < len(responses):
                ranked_order.append(responses[idx])

    return ranked_order

# ✅ Step 5: Combine top-ranked answers
def combine_top_responses(client, ranked_responses, top_n=3):
    prompt = "Combine the following top-ranked answers into a single, high-quality, non-redundant final answer:\n\n"
    for i, r in enumerate(ranked_responses[:top_n], 1):
        prompt += f"Top Answer {i}:\n{r['answer']}\n\n"

    final_response = client.models.generate_content(
        model="gemini-2.0-pro",
        contents=prompt
    )
    return final_response.text.strip()

# ✅ Full RRF pipeline
query_variations = generate_parallel_queries(client, original_query)
responses = get_responses(client, query_variations)
ranked_responses = rank_responses(client, original_query, responses)
final_answer = combine_top_responses(client, ranked_responses)

# ✅ Output
print("✅ Final Answer:\n")
print(final_answer)

Summary

Reciprocal Rank Fusion enhances response quality by combining ranking with query variation.
It simulates real-world decision making — you don’t just collect multiple opinions, you prioritize them.
Perfect for cases where relevance really matters, like search, academic QA, or recommendations.

Coming Up Next…

n the next post, we’ll explore another powerful Query Transformation strategy: Step-Back Prompting — a method where instead of answering a complex query directly, we ask the model to step back, understand the goal, and plan the answer step-by-step.

It’s like solving a big problem by first breaking it into mini-steps — improving both reasoning and response quality.

Stay tuned to unlock this next-level GenAI trick!

Reciprocal Rank Fusion