Parallel Query Magic: Boosting RAG Quality with Gemini & Qdrant

Yogyashri PatilYogyashri Patil
7 min read

In this post, we’ll explore a smart way to boost the quality of search results when working with large text datasets, using a method called Fan-Out Retrieval (also known as Parallel Query Expansion). We'll combine tools like LangChain, Google’s Gemini model, and Qdrant to create a more powerful retrieval system that generates better answers.

🌟 Imagine This…

You're throwing a birthday party and ask:

"What’s the best food to serve at a birthday party?"

But instead of asking just one person, you message:

🍕 Your foodie best friend – always knows trendy dishes
👩‍👧 Your mom – cares about what’s easy to cook and loved by all
🥗 Your gym trainer – wants it to be healthy

Each of them might come back with different answers like "trendy dishes", "easy to cook and loved all", or "healthy", all based on their perspective.

You're not relying on one answer — you're spreading your query across multiple sources, each bringing their own flavor, and then you select what matches your intent.

Now you take all those responses and pick the one (or few) that suit you best.

That’s Parallel Query Retrial. You're fanning out your question to multiple sources in parallel and collecting the most relevant responses.

First, we’ll break down the image and explain how the process works step by step. Then, we’ll walk through the actual code that powers the system.

🔄 How Fan-Out Works in RAG (Made Simple)

Let’s say you ask the question:
🗣️ "How is AI used in healthcare?"

🛠️ Step-by-Step:

🧠 Step 1: The LLM gets creative

It rewrites your original question into multiple variations like:

  • “How is AI improving hospitals?”

  • “What are the medical uses of artificial intelligence?”

  • “Real-world examples of AI in healthcare”

  • “How does machine learning help doctors?”

    This helps the system cover different angles of the same topic.

🚀 Step 2: Fan-Out to the Retriever

Each rewritten query is sent out in parallel to a retriever (like a vector database or search engine). So now instead of one narrow search, the system is doing multiple targeted searches at once.

📚 Step 3: Collect Results

Each query returns its own top documents, articles, or chunks of information often with unique insights.

🧩 Step 4: Merge and Filter

The system fuses all those results removing duplicates, ranking the best ones and gives the final content to the LLM.

✨ Step 5: LLM Generates the Answer

Now the language model has richer, broader, and more accurate info so it can generate a much smarter and detailed answer.

Understanding the Diagram

The diagram illustrates the full Fan-Out Retrieval workflow:

  1. User Input: A person asks a question in natural language.

  2. Query Expansion: A large language model (LLM), such as Gemini, rewrites the query into multiple variations that capture different phrasings.

  3. Parallel Retrieval: Each variation is sent to the Qdrant vector database simultaneously.

  4. Document Retrieval: Each query fetches relevant documents independently.

  5. Deduplication: Results are merged and duplicates are removed using a filter_unique step.

  6. Answer Generation: The cleaned set of documents and the original query are passed to the LLM, which produces the final answer.

This approach increases the chances of finding the most relevant information by considering multiple interpretations of the same question.

Example:-

Here, we rephrase the user input question into several differently worded variations to explore multiple angles of the same query.

Example if user prompt is: “How to store the data into VectorDB?“

Then, its variations can be

 1. What are the best practices for indexing and storing data in a vector database?
 2. How can I optimize my data storage for efficient querying in a vector-based database?
 3. What are the key considerations for designing a scalable and efficient data storage 
 system using vector databases?

How we implement the Parallel Query

Document Ingestion and Retriever Setup

from langchain_community.document_loaders import PyPDFLoader
from pathlib import Path
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mistralai import MistralAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os
from google import genai
from dotenv import load_dotenv

load_dotenv()
MISTRAL_API_KEY=os.getenv("MISTRAL_API_KEY")
os.environ["MISTRAL_API_KEY"]= MISTRAL_API_KEY
pdf_path=Path(__file__).parent/ "os_book.pdf"
loader=PyPDFLoader(file_path=pdf_path)
docs=loader.load()

text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs=text_splitter.split_documents(documents=docs)
embedder = MistralAIEmbeddings(
    model="mistral-embed"
)

retriver=QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_langchain",
    embedding=embedder
)
print("Injection done....")

Generating Parallel Query Retrieval

from langchain_community.document_loaders import PyPDFLoader
from pathlib import Path
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mistralai import MistralAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os
from google import genai
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate


load_dotenv()
MISTRAL_API_KEY=os.getenv("MISTRAL_API_KEY")
os.environ["MISTRAL_API_KEY"]= MISTRAL_API_KEY
pdf_path=Path(__file__).parent/ "os_book.pdf"
loader=PyPDFLoader(file_path=pdf_path)
docs=loader.load()

text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs=text_splitter.split_documents(documents=docs)
embedder = MistralAIEmbeddings(
    model="mistral-embed"
)

retriver=QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_langchain",
    embedding=embedder
)
print("Injection done....")
user_query=input(" > ")
SYSTEM_PROMPT="""
You are an helpful AI Assistant that generates multiple alternates 
search query out of user's input query. These alternate queris
wil be used to make semantic search within a vector databse
using similarity metrices. Generate 5 alternate 
queries that can be formed to better understand user's 
input query given below.

context:
{user_query}
Strictly return only the alternate queries separated by new line
"""
client = genai.Client(api_key=os.getenv("api_key"))
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        SYSTEM_PROMPT,
        user_query
    ]
)
# print(response.content)
print(response.text)

Generated Queries

 > what is operating system
Definition of operating system
Functions of operating system
Types of operating systems
Examples of operating systems
How does an operating system work

Retrieving Final Answer

from langchain_community.document_loaders import PyPDFLoader
from pathlib import Path
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mistralai import MistralAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
import os
from google import genai
from concurrent.futures import ThreadPoolExecutor
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate


load_dotenv()
MISTRAL_API_KEY=os.getenv("MISTRAL_API_KEY")
os.environ["MISTRAL_API_KEY"]= MISTRAL_API_KEY
pdf_path=Path(__file__).parent/ "os_book.pdf"
loader=PyPDFLoader(file_path=pdf_path)
docs=loader.load()

text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs=text_splitter.split_documents(documents=docs)
embedder = MistralAIEmbeddings(
    model="mistral-embed"
)

retriver=QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_langchain",
    embedding=embedder
)
print("Injection done....")
user_query=input(" > ")
SYSTEM_PROMPT="""
You are an helpful AI Assistant that generates multiple alternates 
search query out of user's input query. These alternate queris
wil be used to make semantic search within a vector databse
using similarity metrices. Generate 5 alternate 
queries that can be formed to better understand user's 
input query given below.

context:
{user_query}
Strictly return only the alternate queries separated by new line
"""
client = genai.Client(api_key=os.getenv("api_key"))
response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=[
        SYSTEM_PROMPT,
        user_query
    ]
)
output = response.text
queries = [q.strip() for q in output.split('\n') if q.strip()]
print(queries)
def search_for_query(q):
    return retriver.similarity_search(q, k=2)  

with ThreadPoolExecutor() as executor:
    all_results = list(executor.map(search_for_query, queries))

flattened_results = [doc for result in all_results for doc in result]

print("All relevant chunks from parallel search:")
for doc in flattened_results:
    print(doc.page_content)

🔗 Explore the full implementation on GitHub: YogyashriPatil/parallel_query – Dive into the codebase for a hands-on look at how Fan-Out Retrieval is built using LangChain, Gemini, and Qdrant.

🎯 What This Teaches in Querying:

  • You asked one question, but broke it into different “contexts” or experts

  • Each gave unique but relevant data

  • You merged those results to make a smarter final decision

Wrapping It All Up

In this guide, we explored how Fan-Out Retrieval (also known as Parallel Query Expansion) can significantly improve search quality in RAG (Retrieval-Augmented Generation) systems.

By taking a user’s original query and generating multiple reworded versions using Google Gemini, we cover different angles of the same question. These versions are then used in parallel to retrieve relevant chunks from a Qdrant vector database.

This "fan-out" approach ensures the system isn’t relying on a single search interpretation. Instead, it brings in multiple perspectives giving the language model a broader, richer context to generate more accurate and meaningful answers.

We used tools like:

  • LangChain for orchestration,

  • Mistral embeddings for vector representation,

  • Qdrant for fast semantic search,

  • and Gemini Flash for creative query expansion.

Together, these tools form a smart, flexible, and scalable search pipeline ideal for working with large document datasets. Whether you're building an AI assistant, a document Q&A bot, or a research tool Fan-Out Retrieval gives your system a big boost in depth and precision. 🚀

1
Subscribe to my newsletter

Read articles from Yogyashri Patil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yogyashri Patil
Yogyashri Patil