What Is Pre-Retrieval Query Optimization? 🚀

Pre-retrieval query optimization is like training your research assistant before sending them into a library 📚. It ensures they know exactly what to look for to retrieve the most relevant data—no wasted time, no irrelevant books!

Why Does It Matter? 🤔

Improved Precision 🎯
- Focuses the search to retrieve relevant documents only.
- Example: Instead of searching for “machine learning,” we refine the query to “applications of machine learning in healthcare.”
Enhanced Relevance 🔍
- Handles multi-intent queries by splitting questions into sub-queries.
- Example:
  Query: "Explain AI and its impact on climate change."
  - Split Queries:
    - “What is AI?”
    - “How does AI impact climate change?”
Reduces Ambiguity 🧩
- Fixes vague questions using techniques like:
  - Step-Back Prompting: Clarifies intent with follow-up prompts.
  - HyDE (Hypothetical Document Embeddings): Predicts what the ideal document should look like.
- Example: Query “growth” → Clarify whether it's economic growth or population growth.
Leverages External Knowledge 🌎
- Routes queries to domain-specific databases or external APIs based on context.
- Example: If the query involves medical data, it routes to PubMed instead of general sources.
Foundation for Accurate Outputs 📑
- Prepares the LLM with clean, focused data, avoiding hallucinations or irrelevant results.
- Example: Query: “Effects of meditation on stress levels.”
  - Finds research-backed evidence rather than blog opinions.

Techniques for Optimization 🛠️

Technique	Purpose	Example
Multi-Query Retrieval 🧵	Splits queries into sub-queries to cover multiple angles.	“Impact of AI on climate change” → “What is AI?” + “AI effects on climate.”
Decomposition 🔗	Breaks down complex queries into manageable parts.	“Benefits and risks of AI” → Query benefits and risks separately.
Step-Back Prompting 🧑‍🏫	Adds context before query execution to clarify intent.	Query: “How does it work?” → Step back: “What does ‘it’ refer to?”*
HyDE (Hypothetical Embeddings) 📝	Predicts ideal documents for embedding before retrieval.	Query: “Economic impact of AI” → Embed AI papers related to economy.
Semantic Routing 🚦	Uses classifiers to route queries to specific knowledge bases.	Query: “Heart disease diagnosis” → Route to PubMed, not Wikipedia.

Real-World Use Case 🌍

Scenario: Financial Data Analysis 📈

Query: “How did Tesla’s stock price react to Elon Musk’s announcements?”

Optimization Steps:

Multi-Query Retrieval → Break into:
- “Tesla stock price history.”
- “Timeline of Elon Musk’s announcements.”
Semantic Routing → Route stock prices to Yahoo Finance API and announcements to News APIs.
HyDE → Generate hypothetical documents about price trends after major news to refine matching.
Final Output → Retrieves charts + commentary + summaries.

Why Is It Better? 🚀

Without optimization: Scattered results, irrelevant articles, and hallucinated answers.
With optimization: Laser-focused outputs, clear context, and factually accurate insights.

Multi-Query Techniques for Complex Information Retrieval 🔍

What Is Multi-Query?

Multi-query is a query expansion technique in RAG systems that creates multiple reformulations of the user’s question to enhance document retrieval. This improves relevance by considering different perspectives of the query.

Why Use Multi-Query? 🤖

Better Coverage of Information 🛠️
- Captures synonyms, variations, and related concepts of the query.
- Example:
  - Original query: “What is LangSmith?”
  - Expanded queries:
    - “Define LangSmith.”
    - “Purpose of LangSmith.”
    - “Use cases of LangSmith.”
Improves Retrieval Accuracy 🎯
- Reduces missed results due to poor keyword matching.
- Broader coverage ensures no relevant documents are left out.
Ideal for Ambiguous Queries 🌐
- Handles multi-intent queries by splitting them into sub-questions.
- Example:
  - Query: “Explain AI and its impact on climate change.”
  - Split into:
    - “What is AI?”
    - “How does AI affect the environment?”

Step-by-Step Implementation 🚀

1. Import Necessary Modules

from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import StrOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI

2. Set Up API Keys 🔑

import os
os.environ["LANGCHAIN_API_KEY"] = "your_langchain_api_key"
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"

3. Prepare and Split Data 📄

from langchain.document_loaders import TextLoader

# Load and split documents
loader = TextLoader("example.txt")
documents = loader.load()

splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)

4. Index Documents with Vector Store 📚

from langchain.embeddings.openai import OpenAIEmbeddings

# Generate embeddings and store in Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()

5. Generate Multi-Query Variations 🧠

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# Multi-query prompt template
template = """
Generate three variations of the following query to capture different aspects:
Query: {query}

Variations:
"""
prompt = ChatPromptTemplate.from_template(template)

# LLM Chain for generating query variations
llm = OpenAI(temperature=0)  # Lower temperature ensures predictable output
query_chain = LLMChain(llm=llm, prompt=prompt)

# Generate variations
query = "What is LangSmith and why do we need it?"
variations = query_chain.run({"query": query})
print(variations)

6. Retrieve Documents Using Multi-Query 🔄

unique_docs = set()  # Store unique results

for variation in variations.splitlines():
    docs = retriever.get_relevant_documents(variation)  # Fetch relevant docs
    unique_docs.update(docs)  # Add to the unique set

7. Run the RAG Model 📝

from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

# Prompt template for final answer generation
rag_prompt = """
Use the context below to answer the question. Be concise and limit to 3 sentences:
Context: {context}

Question: {question}

Answer:
"""
prompt = ChatPromptTemplate.from_template(rag_prompt)

# Final RAG Chain
llm = OpenAI(temperature=0)
rag_chain = LLMChain(llm=llm, prompt=prompt)

# Combine retrieved context
context = "\n".join([doc.page_content for doc in unique_docs])

# Generate final answer
response = rag_chain.run({"context": context, "question": query})
print(response)

Key Features of Multi-Query Implementation

Query Diversification 🌟
- Automatically generates multiple reformulations of the query.
Parallel Retrieval ⚡
- Runs parallel searches to fetch broader document sets.
Duplicate Removal 🧹
- Ensures retrieved results are unique before generating the response.
Dynamic Query Expansion 🧠
- Captures synonyms, related terms, and perspectives automatically.
Configurable LLM Chains 🔗
- Uses ChatOpenAI with templates for customization.

Sample Output 📝

Input Query:
“What is LangSmith and why do we need it?”

Generated Queries:

“What does LangSmith do?”
“Explain the features of LangSmith.”
“Why is LangSmith useful in RAG systems?”

Final Answer:
LangSmith is a tool for building and optimizing RAG pipelines. It simplifies query management, improves search accuracy, and supports scalable document retrieval. Thanks for asking! ✅

Pro Tips 🌟

Test with Ambiguous Queries: Experiment with multi-intent questions for better outputs.
Scale Vector Stores: Replace Chroma with Qdrant for high-performance storage if needed.
Monitor Query Performance: Use tools like LangSmith to analyze how queries are processed.
Fine-Tune Prompt Templates: Adjust prompts to fit domain-specific requirements.

How to optimize query in RAGs 🚀

Table of contents