Multi-Query Techniques for Complex Information Retrieval 🚦

Table of contents
- Why Multi-Query Rocks 🌟
- How Multi-Query Works 🔧
- Step 1: Load and Split Documents (PREP WORK)
- Step 2: Index Documents (VECTOR STORE)
- Step 3: Generate Multi-Query Variations (KICK-IN 🔥)
- Step 4: Retrieve Documents with Multi-Query (KICK-IN CONTINUES 🚀)
- Step 5: Generate Answers (FINAL STEP—NO MORE QUERIES)
- Multi-Query in Action Summary 🚦
- Final Takeaway 🌟
Multi-query techniques supercharge RAG systems by expanding a user’s question into multiple perspectives—ensuring broader, deeper searches for better answers even when documents lack exact keywords.
Why Multi-Query Rocks 🌟
Improves Coverage 📚—Captures different angles of the query.
Finds Hidden Context 🔍—Picks up documents with synonyms or related ideas.
Boosts Recall 🔄—Grabs more relevant data while avoiding missed details.
Saves Time & Costs 💸—Fewer retries, better hits on the first try.
How Multi-Query Works 🔧
Step 1: Generate Diverse Queries
- Starts with 1 user question → Expands into 3–5 rephrased questions.
Example:
💬 Original Query: "What is LangSmith and why do we need it?"
✨ Generated Queries:
"What are the features of LangSmith?"
"How does LangSmith improve language models?"
"Why is LangSmith important for AI development?"
Step 2: Parallel Search 🧭
- Runs all queries at once → Combines results.
Step 3: Deduplicate Results 📌
- Removes duplicates → Keeps unique hits only.
Step 4: Context-Driven Answers ✍️
- Passes combined results to LLM → Generates focused responses.
Step 1: Load and Split Documents (PREP WORK)
Purpose:
Load documents. 📚
Split them into smaller chunks for better indexing and search later.
🛠️ Code Block
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load docs
docs = []
loaders = [TextLoader("/path/to/doc1.txt"), TextLoader("/path/to/doc2.txt")]
for loader in loaders:
docs.extend(loader.load())
# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=60)
chunks = splitter.split_documents(docs)
💡 Multi-Query NOT here yet—This is just prep work to set up the data!
Step 2: Index Documents (VECTOR STORE)
Purpose:
- Store chunks into a searchable vector database (Chroma) for fast retrieval.
🛠️ Code Block
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
# Create vector store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
💡 Multi-Query NOT here yet—We’re still setting up the storage for documents.
Step 3: Generate Multi-Query Variations (KICK-IN 🔥)
Purpose:
THIS is where the Multi-Query technique begins.
LLM rephrases the user’s question into multiple variations to capture different angles.
🛠️ Code Block
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
# Template for generating query variations
template = """
Generate 3 variations of the question to capture different perspectives:
Question: {question}
Variations:
1.
2.
3.
"""
prompt_perspectives = ChatPromptTemplate.from_template(template)
# Create variations using LLM
generate_queries = (
prompt_perspectives
| ChatOpenAI(temperature=0) # Set temperature to 0 for deterministic outputs
| (lambda x: x.split("\n")) # Split variations into a list
)
# Example user query
question = "What is LangSmith, and why do we need it?"
query_variations = generate_queries.invoke({"question": question})
print(query_variations) # Output: 3 rephrased queries
🎯 Multi-Query STARTS HERE!
Generates multiple versions of the question (e.g., synonyms, rephrased formats).
Prepares for broader, deeper searches.
Example Output:
1. What features does LangSmith provide?
2. How does LangSmith improve AI workflows?
3. Why is LangSmith useful for AI applications?
Step 4: Retrieve Documents with Multi-Query (KICK-IN CONTINUES 🚀)
Purpose:
Use each variation of the query to retrieve results.
Combine results while removing duplicates.
🛠️ Code Block
def get_unique_union(results):
# Flatten nested lists of documents
flat_results = [doc for result in results for doc in result]
# Deduplicate results based on content
seen = set()
unique_docs = []
for doc in flat_results:
key = doc.page_content
if key not in seen:
unique_docs.append(doc)
seen.add(key)
return unique_docs
# Perform searches for each query variation
results = [retriever.get_relevant_documents(q) for q in query_variations]
# Merge and remove duplicates
unique_docs = get_unique_union(results)
print(f"Retrieved {len(unique_docs)} unique documents.")
🎯 Multi-Query IN ACTION here!
Parallel searches: Uses each query variation to fetch documents.
Merge & Deduplicate: Combines results into one set without duplicates.
Step 5: Generate Answers (FINAL STEP—NO MORE QUERIES)
Purpose:
- Use the retrieved documents as context to answer the user’s question.
🛠️ Code Block
from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate
# Template for response generation
template = """
Use the following context to answer the question accurately:
Context: {context}
Question: {question}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)
# Create final RAG pipeline
final_rag_chain = (
{
"context": unique_docs,
"question": question,
}
| prompt
| ChatOpenAI(temperature=0)
)
# Generate answer
answer = final_rag_chain.invoke({"question": question})
print(answer)
💡 Multi-Query ENDS HERE!
Uses already retrieved documents to generate the final answer.
Doesn’t expand queries further—just builds the response.
Multi-Query in Action Summary 🚦
Kick-In Point #1 (Step 3):
Generates query variations using LLM.
Captures different perspectives.
Kick-In Point #2 (Step 4):
Uses all query variations to fetch documents.
Merges results while removing duplicates.
Stop Point (Step 5):
- Uses retrieved documents to generate the answer.
Final Takeaway 🌟
Multi-Query expands searches to increase relevance and reduce misses.
It kicks in at Steps 3 & 4, but stops at Step 5 when the final response is built.
Think of it as casting a wider net (queries) before filtering and serving the catch (answers). 🎣
Subscribe to my newsletter
Read articles from Anix Lynch directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
