Parallel Query Retrieval

MUKUL RANAMUKUL RANA
3 min read

We all know that not everyone is going to write a perfect prompt that an AI model can easily understand and respond to with high accuracy, right?

That’s why this series exists.

In this series, we'll explore different Query Transformation techniques that focus on improving the user's original question — so we can get better, more accurate, and more relevant results from the AI.

Let’s understand this in a simple way.

Imagine you’ve prepared well for your exam. You know the answer to a question. But in the exam, the same question is asked using different words — maybe rearranged or with a different phrasing — but the meaning is still the same. In that case, the answer should also be the same, right?

This is the idea behind Parallel Query Retrieval.

How It Works

Here’s what we do step-by-step:

  1. Take the original user query.

  2. Use an LLM (like Gemini) to generate 4–5 alternate versions of the same question (these are called parallel queries).

  3. Use each version to fetch relevant responses.

  4. Finally, feed all the responses back into the model and ask it to combine them into a single, unique, complete answer.

The result? An answer that covers different angles and phrasing styles — and is more accurate and complete.

Code Example: Using Gemini (Google Generative AI)

Here’s a basic implementation of Parallel Query Retrieval using Gemini (via the google-generativeai Python library).

Make sure you have your Google AI Studio API key.

Install the SDK

pip install -q -U google-genai

Python Code

from google import genai

# ✅ Step 0: Create a client instance
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

# ✅ Step 1: Original user query
original_query = "What are the benefits of using graph databases in AI applications?"

# ✅ Step 2: Generate semantically similar queries
def generate_parallel_queries(client, query, n=5):
    prompt = f"Generate {n} semantically similar versions of this question:\n\n'{query}'"

    response = client.models.generate_content(
        model="gemini-2.0-pro",
        contents=prompt
    )

    text = response.text.strip()
    lines = [line.strip("-• ").strip() for line in text.split("\n") if line.strip()]
    return lines[:n]

# ✅ Step 3: Get answers for each variant
def get_responses(client, queries):
    responses = []
    for q in queries:
        res = client.models.generate_content(
            model="gemini-2.0-pro",
            contents=q
        )
        responses.append(res.text.strip())
    return responses

# ✅ Step 4: Combine all responses into one
def combine_responses(client, responses):
    combined_prompt = "Combine the following answers into one complete, non-redundant response:\n\n"
    for i, answer in enumerate(responses, 1):
        combined_prompt += f"Answer {i}:\n{answer}\n\n"

    final_response = client.models.generate_content(
        model="gemini-2.0-pro",
        contents=combined_prompt
    )
    return final_response.text.strip()

# ✅ Full pipeline
query_variations = generate_parallel_queries(client, original_query)
individual_answers = get_responses(client, query_variations)
final_answer = combine_responses(client, individual_answers)

# ✅ Output result
print("✅ Final Answer:\n")
print(final_answer)

Coming up Next…

In the next post, we’ll explore another powerful Query Transformation technique: Reciprocal Rank Fusion— where we do similar to parallel query but include each qusery on the baisis of hte rank.

Stay tuned, and follow the series if you're serious about mastering RAG with real-world AI tricks.

0
Subscribe to my newsletter

Read articles from MUKUL RANA directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

MUKUL RANA
MUKUL RANA