Parallel Query Retrial (Fan-Out): Making RAG Faster & Smarter

Yash PandavYash Pandav
5 min read

When we ask questions to AI models, especially in complex systems like Retrieval-Augmented Generation (RAG), we expect fast and accurate answers. But what if the system doesn't always find the best result on the first try?

Thatโ€™s where Parallel Query Retrial, also known as Fan-Out, comes into play. It's like sending out multiple versions of the same question in different directions, all at once to gather the best possible answers.


Imagine Thisโ€ฆ

You're planning a trip and ask your friends:

"Hey, what's the best place to visit in Europe in spring?"

But instead of just asking one friend, you message:

  • Your travel blogger cousin

  • Your Instagram-savvy friend

  • A coworker who lived in Europe

  • ChatGPT ๐Ÿ˜‰

Each of them might come back with different answers like "Paris", "Amsterdam", or "Barcelona", all based on their perspective.

Now you take all those responses and pick the one (or few) that suit you best.

Thatโ€™s Parallel Query Retrial. You're fanning out your question to multiple sources in parallel and collecting the most relevant responses.


How Fan-Out Works in RAG

In RAG systems, Fan-Out is used to improve both retrieval and generation quality (want to learn about how rag works? ๐Ÿ‘‰ RAG Explained: Supercharge Your LLM with Real-Time Knowledge). Hereโ€™s how:

  1. The system takes your original query (e.g., โ€œHow is AI being used in healthcare?โ€)

  2. It rephrases the query in different ways:

    • โ€œWhat are current uses of AI in hospitals?โ€

    • โ€œHow does artificial intelligence improve patient care?โ€

    • โ€œExamples of AI applications in medical fieldโ€

  3. All these rewritten queries are sent in parallel to the retriever (like a vector database or search engine).

  4. Each query fetches its own top results (documents, articles, etc.).

  5. The system then combines the results to get the most accurate and diverse information.


Code

Storing document and creating Retrieval

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain.chains import LLMChain
from dotenv import load_dotenv
import os

load_dotenv()

pdf_path = Path("./nodejs.pdf")

load = PyPDFLoader(file_path=pdf_path)
doc = load.load()

# โœ‚๏ธ Split the PDF into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,    
    chunk_overlap=200
)
spited = text_splitter.split_documents(documents=doc)

if not os.path.exists("qdrant_store"):
    store = Chroma.from_documents(
        documents=spited,
        embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"),
        persist_directory="qdrant_store"
    )
    print("New Chroma DB created.")
else:
    print("Directory already exists. Skipping creation.")

# Convert vector store into a retriever to perform similarity-based search
retriever = store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 10}  
)

Configuring Model

from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3, max_tokens=500)


system_prompt = "You are a helpful assistant that answers questions about NodeJs {context}"

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{input}"),
])

# Create a chain that combines documents and generates the answer
question_answer_chain = create_stuff_documents_chain(llm, prompt)

Generating Parallel Query Retrieval


# Prompt for generating query variations
query_variation_prompt = PromptTemplate.from_template(
    "Generate 5 diverse and useful variations of the following question for better document retrieval:\n\nQuestion: {question}\n\nVariations:"
)

# Chain to run the variation prompt
variation_chain = LLMChain(llm=llm, prompt=query_variation_prompt)

user_question = "Can you explain routing in nodejs briefly?"

# Generate variations
variation_response = variation_chain.invoke({"question": user_question})

# Extract the individual variations
query_variations = [line.strip("- ").strip() for line in variation_response['text'].split("\n") if line.strip()]

# Combine the original question and all generated variations
all_queries = [user_question] + query_variations

print("\n Generated Query Variations:")
for i, q in enumerate(query_variations, 1):
    print(f"{i}. {q}")

Generated Queries

 Generated Query Variations:
1. **Node.js Routing: A concise explanation** (Focuses on brevity and uses a keyword likely found in documentation titles)
2. **How are routes defined and handled in a Node.js application?** (More specific about the mechanics of routing)
3. **What are the core concepts and components involved in Node.js routing (e.g., request, response, middleware)?** (Emphasizes key elements and encourages structured explanations)
4. **Best practices for implementing efficient and scalable routing in Node.js** (Shifts focus towards practical application and optimization)
5. **Comparison of different Node.js routing libraries/frameworks (e.g., Express.js, Hapi.js).** (Expands the scope to include common tools and encourages comparative analysis)

Combining all Query

all_docs = []
for query in all_queries:
    docs = retriever.invoke(query)
    all_docs.extend(docs)

Retrieving Final Answer

response = question_answer_chain.invoke({
    "context": all_docs,
    "input": user_question
})
print("\nFinal Answer:\n", response)

Output


Output
Final Answer:
 Routing in Node.js, typically using the Express.js framework, directs incoming HTTP requests to appropriate handler functions based on the URL path.  Think of it as a traffic controller for your web application.

Here's a breakdown:

1. **Request:** A client (like a web browser) sends an HTTP request to your server with a specific URL (e.g., `/users` or `/products/123`).

2. **Router:** Express.js uses a `Router` object to match the request's URL path to predefined routes.

3. **Routes:** Routes are defined using HTTP methods (GET, POST, PUT, DELETE, etc.) and URL patterns. For example:

   ```javascript
   const express = require('express');
   const router = express.Router();

   router.get('/users', (req, res) => { 
       // Handle GET request to /users
   });

   router.post('/products', (req, res) => {
       // Handle POST request to /products
   });

   module.exports = router;
.....

Wrapping It All Up

In a world full of nuanced queries and massive data pools, Fan-Out acts as your intelligent safety net. It ensures that no matter how a question is phrased, the system maximizes its chances of understanding and delivering a high-quality answer.

๐Ÿš€ Ready to go deeper? Iโ€™ve broken down more such strategies in this article:

๐Ÿ‘‰ Mastering RAG: Advanced Methods to Enhance Retrieval-Augmented Generation

If this helped you level up your RAG game, hit that โค๏ธ like, drop a ๐Ÿ’ฌ comment, and donโ€™t forget to follow me for more AI + dev content! Got questions or just wanna nerd out? Iโ€™d love to chat.

Thanks for reading and keep building awesome stuff!

11
Subscribe to my newsletter

Read articles from Yash Pandav directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yash Pandav
Yash Pandav

I am Yash Pandav, with a strong foundation in programming languages including ๐™…๐™–๐™ซ๐™–, ๐™…๐™–๐™ซ๐™–๐™Ž๐™˜๐™ง๐™ž๐™ฅ๐™ฉ, and ๐˜พ, and I specialize in ๐™›๐™ช๐™ก๐™ก-๐™จ๐™ฉ๐™–๐™˜๐™  ๐™ฌ๐™š๐™— ๐™™๐™š๐™ซ๐™š๐™ก๐™ค๐™ฅ๐™ข๐™š๐™ฃ๐™ฉ using ๐™๐™š๐™–๐™˜๐™ฉ.๐™Ÿ๐™จ, ๐™‰๐™ค๐™™๐™š.๐™Ÿ๐™จ, ๐™€๐™ญ๐™ฅ๐™ง๐™š๐™จ๐™จ.๐™Ÿ๐™จ, and ๐™ˆ๐™ค๐™ฃ๐™œ๐™ค๐˜ฟ๐˜ฝ. My experience includes building scalable web applications, optimizing backend performance, and implementing RESTful APIs. I'm also well-versed in ๐™‚๐™ž๐™ฉ & ๐™‚๐™ž๐™ฉ๐™ƒ๐™ช๐™—, ๐™™๐™–๐™ฉ๐™–๐™—๐™–๐™จ๐™š ๐™ข๐™–๐™ฃ๐™–๐™œ๐™š๐™ข๐™š๐™ฃ๐™ฉ, and ๐™˜๐™ก๐™ค๐™ช๐™™ ๐™ฉ๐™š๐™˜๐™๐™ฃ๐™ค๐™ก๐™ค๐™œ๐™ž๐™š๐™จ like ๐˜ผ๐™ฅ๐™ฅ๐™ฌ๐™ง๐™ž๐™ฉ๐™š and ๐˜พ๐™ก๐™ค๐™ช๐™™๐™ž๐™ฃ๐™–๐™ง๐™ฎ.I'm also exploring the world of ๐˜ฟ๐™–๐™ฉ๐™– ๐™Ž๐™˜๐™ž๐™š๐™ฃ๐™˜๐™š, with hands-on work in data analysis, visualization, and ML fundamentals. Recently, I dove deep into the world of Generative AI through the GenAI Cohort, where I built intelligent RAG-powered applications that bridge unstructured data (PDFs, CSVs, YouTube) with LLMs. This has opened doors to developing more advanced, context-aware AI systems.or platforms like Twitter or LinkedIn bio sections?