Master Advanced RAG Techniques with Parallel Query Retrieval Using LangChain, Gemini, and Qdrant

Ashwin HegdeAshwin Hegde
4 min read

Overview

IIn this post, we’ll explore a smart way to boost the quality of search results when working with large text datasets, using a method called Fan-Out Retrieval (also known as Parallel Query Expansion). We'll combine tools like LangChain, Google’s Gemini model, and Qdrant to create a more powerful retrieval system that generates better answers.

First, we’ll break down the image and explain how the process works step by step. Then, we’ll walk through the actual code that powers the system.


Parallel Query (Fan Out) Retrieval

Understanding the Diagram

The diagram illustrates the full Fan-Out Retrieval workflow:

  1. User Input: A person asks a question in natural language.

  2. Query Expansion: A large language model (LLM), such as Gemini, rewrites the query into multiple variations that capture different phrasings.

  3. Parallel Retrieval: Each variation is sent to the Qdrant vector database simultaneously.

  4. Document Retrieval: Each query fetches relevant documents independently.

  5. Deduplication: Results are merged and duplicates are removed using a filter_unique step.

  6. Answer Generation: The cleaned set of documents and the original query are passed to the LLM, which produces the final answer.

This approach increases the chances of finding the most relevant information by considering multiple interpretations of the same question.


Code Walkthrough

Let’s go through the implementation, step by step.

1. Enable Logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

This enables logging for the multi_query module so we can see what’s happening behind the scenes.

2. Set Up the API Key

if "GEMINI_API_KEY" in os.environ:
    os.environ["GOOGLE_API_KEY"] = os.environ["GEMINI_API_KEY"]

Ensures the appropriate API key is available for Gemini.

3. Load and Chunk the PDF

loader = PyPDFLoader("nodejs.pdf")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splits = text_splitter.split_documents(data)

We load a PDF (about Node.js), then break it into overlapping 500-character chunks to prepare it for vector embedding.

4. Create Embeddings and Store in Qdrant

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
db = QdrantVectorStore.from_documents(
    documents=splits,
    embedding=embeddings,
    url="http://localhost:6333",
    collection_name="langchain_gemini_demo",
)

Gemini’s embedding model converts the chunks into vectors, which we store in a Qdrant vector database.

5. Initialize the LLM and MultiQueryRetriever

llm = GoogleGenerativeAI(model="models/gemini-2.0-flash", temperature=0.0)
retriever = MultiQueryRetriever.from_llm(
    retriever=db.as_retriever(), llm=llm
)

We initialize the Gemini LLM and pass it to the MultiQueryRetriever, which will handle generating the expanded queries.

6. Query and Retrieve Documents

user_query = "What is FS Module?"  
unique_docs = retriever.invoke(user_query)

A user query is submitted. The retriever expands it, performs parallel retrieval, and returns a set of unique relevant documents.

7. Generate the Final Answer

context = "\n".join([doc.page_content for doc in unique_docs])
final_prompt = f"""Here is some context:
{context}

Answer the following question using the context provided:
{user_query}"""
final_answer = llm.invoke(final_prompt)
print(final_answer)

The retrieved documents are compiled into a context prompt, which is then used by Gemini to generate a concise, context-aware answer.

Multi-Query Retriever Output Analysis

  • WARNING:pypdf._reader:Ignoring wrong pointing object 268 0 (offset 0)

  • WARNING:pypdf._reader:Ignoring wrong pointing object 309 0 (offset 0)

Generated Queries:

The MultiQueryRetriever generated these queries from the original question ("What is FS Module?"):

  • 1. What is the Node.js FS Module and its purpose?

  • 2. How does the FS Module in Node.js work for file system operations?

  • 3. Explain the functionality and common uses of the FS (File System) Module in Node.js.

Final Answer:

The FS module is a built-in Node.js module that provides functions you can use to manipulate the file system.


Conclusion

Fan-Out Retrieval makes search results smarter by rephrasing the user's question in different ways and checking each version for answers. This increases the chances of finding the right information.

We use LangChain to organize the process, Gemini to generate and answer questions, and Qdrant to store and search through the document data. This approach is great for situations where people ask the same thing in different ways, and it works well for large collections of text or building smart assistants.


Source Code

You can find the complete code implementation in this GitHub repository:
👉 https://github.com/Ashwinhegde19/GenAI/blob/main/multi_query_retriever.py

0
Subscribe to my newsletter

Read articles from Ashwin Hegde directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ashwin Hegde
Ashwin Hegde