Parallel Query Retrial (Fan-Out): Making RAG Faster & Smarter


When we ask questions to AI models, especially in complex systems like Retrieval-Augmented Generation (RAG), we expect fast and accurate answers. But what if the system doesn't always find the best result on the first try?
Thatโs where Parallel Query Retrial, also known as Fan-Out, comes into play. It's like sending out multiple versions of the same question in different directions, all at once to gather the best possible answers.
Imagine Thisโฆ
You're planning a trip and ask your friends:
"Hey, what's the best place to visit in Europe in spring?"
But instead of just asking one friend, you message:
Your travel blogger cousin
Your Instagram-savvy friend
A coworker who lived in Europe
ChatGPT ๐
Each of them might come back with different answers like "Paris", "Amsterdam", or "Barcelona", all based on their perspective.
Now you take all those responses and pick the one (or few) that suit you best.
Thatโs Parallel Query Retrial. You're fanning out your question to multiple sources in parallel and collecting the most relevant responses.
How Fan-Out Works in RAG
In RAG systems, Fan-Out is used to improve both retrieval and generation quality (want to learn about how rag works? ๐ RAG Explained: Supercharge Your LLM with Real-Time Knowledge). Hereโs how:
The system takes your original query (e.g., โHow is AI being used in healthcare?โ)
It rephrases the query in different ways:
โWhat are current uses of AI in hospitals?โ
โHow does artificial intelligence improve patient care?โ
โExamples of AI applications in medical fieldโ
All these rewritten queries are sent in parallel to the retriever (like a vector database or search engine).
Each query fetches its own top results (documents, articles, etc.).
The system then combines the results to get the most accurate and diverse information.
Code
Storing document and creating Retrieval
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import LLMChain
from dotenv import load_dotenv
import os
load_dotenv()
pdf_path = Path("./nodejs.pdf")
load = PyPDFLoader(file_path=pdf_path)
doc = load.load()
# โ๏ธ Split the PDF into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
spited = text_splitter.split_documents(documents=doc)
if not os.path.exists("qdrant_store"):
store = Chroma.from_documents(
documents=spited,
embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"),
persist_directory="qdrant_store"
)
print("New Chroma DB created.")
else:
print("Directory already exists. Skipping creation.")
# Convert vector store into a retriever to perform similarity-based search
retriever = store.as_retriever(
search_type="similarity",
search_kwargs={"k": 10}
)
Configuring Model
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3, max_tokens=500)
system_prompt = "You are a helpful assistant that answers questions about NodeJs {context}"
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("user", "{input}"),
])
# Create a chain that combines documents and generates the answer
question_answer_chain = create_stuff_documents_chain(llm, prompt)
Generating Parallel Query Retrieval
# Prompt for generating query variations
query_variation_prompt = PromptTemplate.from_template(
"Generate 5 diverse and useful variations of the following question for better document retrieval:\n\nQuestion: {question}\n\nVariations:"
)
# Chain to run the variation prompt
variation_chain = LLMChain(llm=llm, prompt=query_variation_prompt)
user_question = "Can you explain routing in nodejs briefly?"
# Generate variations
variation_response = variation_chain.invoke({"question": user_question})
# Extract the individual variations
query_variations = [line.strip("- ").strip() for line in variation_response['text'].split("\n") if line.strip()]
# Combine the original question and all generated variations
all_queries = [user_question] + query_variations
print("\n Generated Query Variations:")
for i, q in enumerate(query_variations, 1):
print(f"{i}. {q}")
Generated Queries
Generated Query Variations:
1. **Node.js Routing: A concise explanation** (Focuses on brevity and uses a keyword likely found in documentation titles)
2. **How are routes defined and handled in a Node.js application?** (More specific about the mechanics of routing)
3. **What are the core concepts and components involved in Node.js routing (e.g., request, response, middleware)?** (Emphasizes key elements and encourages structured explanations)
4. **Best practices for implementing efficient and scalable routing in Node.js** (Shifts focus towards practical application and optimization)
5. **Comparison of different Node.js routing libraries/frameworks (e.g., Express.js, Hapi.js).** (Expands the scope to include common tools and encourages comparative analysis)
Combining all Query
all_docs = []
for query in all_queries:
docs = retriever.invoke(query)
all_docs.extend(docs)
Retrieving Final Answer
response = question_answer_chain.invoke({
"context": all_docs,
"input": user_question
})
print("\nFinal Answer:\n", response)
Output
Output
Final Answer:
Routing in Node.js, typically using the Express.js framework, directs incoming HTTP requests to appropriate handler functions based on the URL path. Think of it as a traffic controller for your web application.
Here's a breakdown:
1. **Request:** A client (like a web browser) sends an HTTP request to your server with a specific URL (e.g., `/users` or `/products/123`).
2. **Router:** Express.js uses a `Router` object to match the request's URL path to predefined routes.
3. **Routes:** Routes are defined using HTTP methods (GET, POST, PUT, DELETE, etc.) and URL patterns. For example:
```javascript
const express = require('express');
const router = express.Router();
router.get('/users', (req, res) => {
// Handle GET request to /users
});
router.post('/products', (req, res) => {
// Handle POST request to /products
});
module.exports = router;
.....
Wrapping It All Up
In a world full of nuanced queries and massive data pools, Fan-Out acts as your intelligent safety net. It ensures that no matter how a question is phrased, the system maximizes its chances of understanding and delivering a high-quality answer.
๐ Ready to go deeper? Iโve broken down more such strategies in this article:
๐ Mastering RAG: Advanced Methods to Enhance Retrieval-Augmented Generation
If this helped you level up your RAG game, hit that โค๏ธ like, drop a ๐ฌ comment, and donโt forget to follow me for more AI + dev content! Got questions or just wanna nerd out? Iโd love to chat.
Thanks for reading and keep building awesome stuff!
Subscribe to my newsletter
Read articles from Yash Pandav directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Yash Pandav
Yash Pandav
I am Yash Pandav, with a strong foundation in programming languages including ๐ ๐๐ซ๐, ๐ ๐๐ซ๐๐๐๐ง๐๐ฅ๐ฉ, and ๐พ, and I specialize in ๐๐ช๐ก๐ก-๐จ๐ฉ๐๐๐ ๐ฌ๐๐ ๐๐๐ซ๐๐ก๐ค๐ฅ๐ข๐๐ฃ๐ฉ using ๐๐๐๐๐ฉ.๐๐จ, ๐๐ค๐๐.๐๐จ, ๐๐ญ๐ฅ๐ง๐๐จ๐จ.๐๐จ, and ๐๐ค๐ฃ๐๐ค๐ฟ๐ฝ. My experience includes building scalable web applications, optimizing backend performance, and implementing RESTful APIs. I'm also well-versed in ๐๐๐ฉ & ๐๐๐ฉ๐๐ช๐, ๐๐๐ฉ๐๐๐๐จ๐ ๐ข๐๐ฃ๐๐๐๐ข๐๐ฃ๐ฉ, and ๐๐ก๐ค๐ช๐ ๐ฉ๐๐๐๐ฃ๐ค๐ก๐ค๐๐๐๐จ like ๐ผ๐ฅ๐ฅ๐ฌ๐ง๐๐ฉ๐ and ๐พ๐ก๐ค๐ช๐๐๐ฃ๐๐ง๐ฎ.I'm also exploring the world of ๐ฟ๐๐ฉ๐ ๐๐๐๐๐ฃ๐๐, with hands-on work in data analysis, visualization, and ML fundamentals. Recently, I dove deep into the world of Generative AI through the GenAI Cohort, where I built intelligent RAG-powered applications that bridge unstructured data (PDFs, CSVs, YouTube) with LLMs. This has opened doors to developing more advanced, context-aware AI systems.or platforms like Twitter or LinkedIn bio sections?