Parallel query retrieval, also known as fan-out retrieval, in Retrieval-Augmented Generation (RAG) involves generating multiple, slightly different queries based on the user's input and then running these queries in parallel against the vector database.
This approach aims to increase the likelihood of retrieving relevant documents by exploring different facets of the user's query.

Let’s understand this with a diagram

Chalo ab ek example se samjhte hai ki parallel query retrieval hota kya hai :

💡 Scene: Sharma Ji ka Beta – IAS ki Taiyari

Socho Sharma ji ka beta Rajesh IAS ki tayari kar raha hai. Ab usko ek topic pe notes banane hain: "Indian Agriculture".

Lekin Rajesh ka dimaag tez hai. Usne socha – agar main sirf ek question puchhunga toh shayad poori information na mile.

Toh usne 4-5 alag-alag queries banayi:

"History of Indian agriculture"
"Current problems faced by Indian farmers"
"Government schemes for agriculture"
"Technological developments in farming"

Ab Rajesh ne kya kiya? Ek hi time pe sab queries ko apne chaar dost – Google, ChatGPT, NotesWala Yrr , aur Library Uncle ko de diya.

Yeh sab parallel queries hain. Sab alag log alag angle se information deke aaye. Fir Rajesh ne sab data ko ek jagah compile kiya aur ek solid answer likh diya.

🧠 RAG Model mein kya hota hai?

Bilkul Rajesh jaisa hi:

User query aayi – “Tell me about Indian agriculture”
RAG model ek se zyada query generate karta hai (rephrased or diversified queries)
Sab queries ek saath retrieval engine ko bhej di jaati hain (e.g., ElasticSearch, FAISS, etc.)
Har query se alag-alag documents ya passages milte hain
Sab mila ke model ko diya jaata hai → Model ek comprehensive aur relevant answer generate karta hai

Bhai, agar IAS ki tayari karni hai toh sirf ek sawaal se kaam nahi chalega. Alag-alag tarike se sawaal poochho, sabka jawaab ek jagah jama karo, aur fir ek dum se dhansu answer likho. RAG model bhi yahi karta hai parallel query retrieval se! 💥

Lets’s look at the code now :

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from dotenv import load_dotenv
from openai import OpenAI
import os
import ast

# Load environment variables from .env file
load_dotenv()
apikey = os.environ["OPENAI_API_KEY"]

# Initialize OpenAI client
client = OpenAI(api_key=apikey)

# 1. Load and split PDF
pdf_path = Path(__file__).parent / "node_js_sample.pdf"
loader = PyPDFLoader(pdf_path)
docs = loader.load()

# Split document into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)

# 2. Create an embedder
embedder = OpenAIEmbeddings(
    model="text-embedding-3-large",
    api_key=apikey
)

# Only run the below once to insert data into Qdrant
# vector_store = QdrantVectorStore.from_documents(
#     documents=split_docs,
#     embedding=embedder,
#     url="http://localhost:6333",
#     collection_name="learning_node_js",
# )
# vector_store.add_documents(documents=split_docs)

# Connect to existing Qdrant vector store
retriever = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_node_js",
    embedding=embedder,
)

print("📄 PDF Ingestion Complete!\n")

# 3. Take user question
user_query = input("Ask a question about Node.js: ")

# 4. Query Expansion Prompt
augmentation_prompt = f"""Generate 3 semantically different variations of this question for better retrieval:
"{user_query}"
Only return a Python list of 3 strings.

Example: ["hi", "hello", "how are you"]
"""

# Call OpenAI to expand query
query_expansion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": augmentation_prompt}]
)

# 5. Parse string output to actual Python list
raw_response = query_expansion.choices[0].message.content.replace("`", "")
similar_queries = ast.literal_eval(raw_response)

print("🔍 Expanded Queries:\n", similar_queries)

# 6. Search for relevant docs for each variation
all_relevant_docs = []
for q in similar_queries:
    docs = retriever.similarity_search(query=q, k=3)
    all_relevant_docs.extend(docs)

# 7. Deduplicate by content
unique_docs = list({doc.page_content: doc for doc in all_relevant_docs}.values())
context = "\n\n".join(doc.page_content for doc in unique_docs)

# 8. Send to OpenAI for final answer generation
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant knowledgeable in Node.js."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_query}"}
    ]
)

# 9. Display response
answer = response.choices[0].message.content.replace("*", "").replace("`", "").replace("#", "")
print("\n💡 Answer:\n", answer)

Output :

(my-env) jassi@lappy:~/gen-ai-cohort$ python -u "/home/jassi/gen-ai-cohort/rag_1.py"
Ignoring wrong pointing object 268 0 (offset 0)
Ignoring wrong pointing object 309 0 (offset 0)
📄 PDF Ingestion Complete!

Ask a question about Node.js: tell me about node js 
🔍 Expanded Queries:
 ['What is Node.js?', 'Can you explain Node.js?', 'How does Node.js work?']

💡 Answer:
 Node.js is a powerful, open-source, cross-platform JavaScript runtime environment that allows developers to run JavaScript code server-side. It was created with the goal of enabling JavaScript developers to use the same language for both client-side and server-side programming, thereby allowing for greater consistency and efficiency in web application development.

 Key Features of Node.js:

1. Built on V8 JavaScript Engine: Node.js uses Google's V8 engine, which executes JavaScript code at lightning speed. This engine compiles JavaScript into machine code, making it highly efficient.

2. Non-Blocking I/O: One of the most significant advantages of Node.js is its non-blocking I/O model. This means that operations such as reading files, accessing databases, and handling network requests do not block the execution of other operations. Instead, they can be handled asynchronously, allowing Node.js to manage multiple connections simultaneously.

3. Single Threaded: Node.js operates on a single-threaded event loop, which means it can handle many connections without creating new threads for each request. This makes it scalable and efficient, especially for I/O-intensive applications.

4. Event-Driven Architecture: Node.js uses an event-driven programming model, which means it responds to events rather than waiting for tasks to complete. This is ideal for developing applications that need to handle many simultaneous connections, like web servers.

5. Rich Ecosystem: Node.js has a vast ecosystem of libraries and frameworks available through npm (Node Package Manager), which provides access to a wide array of modules that simplify development.

6. Ideal for Real-time Applications: Its asynchronous nature and WebSocket support make Node.js particularly well-suited for building real-time applications, such as chat applications, online gaming, and collaborative tools.

7. Cross-Platform: Node.js is available for various operating systems, including Windows, macOS, and Linux, making it accessible to a broad audience.

 Common Use Cases:

- Web Development: Suitable for building scalable web applications, RESTful APIs, and server-side rendering.
- Microservices: Ideal for a microservices architecture due to its lightweight and efficient nature.
- Real-Time Applications: Great for applications that require real-time data exchange, such as messaging apps or live updates.
- Command-Line Tools: Node.js can be used to create various CLI tools that leverage JavaScript.

In summary, Node.js offers a modern way to build network applications using JavaScript. Its event-driven, non-blocking architecture enables efficient handling of multiple operations concurrently, making it an attractive choice for developers looking to create fast and scalable applications.
(my-env) jassi@lappy:~/gen-ai-cohort$

So that’s all! 🙌 This was all about the Parallel Query Retrieval technique.
Hope I made it easier for you to understand 😊

Thank you so much for reading this blog! 💛
I’ll see you in my next one ✨

Till then — keep learning, keep growing! 🚀🌱

#ChaiCode

#GENAI

Parallel Query Retrieval (Fan Out)

💡 Scene: Sharma Ji ka Beta – IAS ki Taiyari

🧠 RAG Model mein kya hota hai?

Subscribe to my newsletter

Jaskamal Singh

Jaskamal Singh