Optimizing RAG with Query Decomposition

🚀 Introduction

Retrieval-Augmented Generation (RAG) has transformed how Large Language Models (LLMs) access and use external knowledge. But a single query — especially if it's vague or multi-step — often isn't enough to retrieve the most relevant information.

That’s where Query Decomposition comes in. It’s a powerful enhancement to RAG pipelines that breaks down complex questions into smaller, more precise sub-queries.

In this post, we’ll explore:

What query decomposition is
Why it matters in real-world applications
How it works (with diagrams and code)
Variants like Less Abstract reasoning and Sequential Answer Composition
How it's different from methods like HyDE and Parallel Fan-out Retrieval

🤯 The Problem: RAG Falls Short on Complex Queries

In standard RAG:

plaintextCopy codeUser Query → Embed → Retrieve → Answer

This works well when:

The query is precise or we can say more abstract
The documents match the keywords very closely

But it fails when:

The query is multi-hop (requires reasoning over multiple facts)
The query is ambiguous or too short
Important context is spread across multiple documents

🧠 Solution: Query Decomposition

Query Decomposition involves using an LLM to break the user's question into smaller, logically simpler sub-questions, then processing each one individually, and finally aggregating their answers.

🔁 General Workflow:

plaintextCopy codeUser Query
   ↓
LLM → [SubQ1, SubQ2, SubQ3]
   ↓
SubQ1 → Retrieve → Answer1
SubQ2 + Answer1 → Retrieve → Answer2
SubQ3 + Answer1 + Answer2 → Retrieve → Answer3
   ↓
LLM(User Query + Answer1 + Answer2 + Answer3) → Final Answer

This is often called "Less Abstract" Query Decomposition, because it promotes step-by-step reasoning.

🔍 Example from White Paper

❓ User Query:

"What is machine learning ?"

🧠 Decomposed:

What is machine ?
What is learning ?
What is machine learning ?

Each of these can be retrieved + answered individually.

⚙️ Code Snippet (LangChain Style)

pythonCopy codefrom langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# Step 1: Decompose
prompt = PromptTemplate.from_template("""
Break the user's question into simpler sub-questions that can be answered independently.

Question: {question}
""")
llm = OpenAI()
decompose_chain = LLMChain(prompt=prompt, llm=llm)

sub_questions = decompose_chain.run("What is machine Learning?")

You can then loop through sub_questions, retrieve relevant chunks for each, and generate individual answers using RAG.

🔁 Workflow of Less Abstract Query Decomposition

🧠 Step 1: Decompose User Query

Use an LLM to break a complex query into smaller logical steps:

cssCopy codeUser Query → ["SubQ1", "SubQ2", "SubQ3"]

🔁 Step 2: Answer in Stages

You iterate through each sub-query one at a time:

2.1 For SubQ1:

🔍 Retrieve relevant chunks for SubQ1
💡 LLM: Answer1 = LLM(SubQ1 + Context1)

2.2 For SubQ2:

🔍 Retrieve for SubQ2
💡 LLM: Answer2 = LLM(SubQ2 + Context2 + Answer1)

2.3 For SubQ3:

🔍 Retrieve for SubQ3
💡 LLM: Answer3 = LLM(SubQ3 + Context3 + Answer1 + Answer2)

🎯 Step 3: Final Answer

Finally, aggregate all steps:

javaCopy codeFinal Answer = LLM(UserQuery + Answer1 + Answer2 + Answer3)

🧠 Advantages

✅ Handles multi-hop and compound questions
✅ More focused retrieval, less noise
✅ Enables sequential reasoning
✅ Works well with agent-style systems and intermediate reasoning chains

🔁 Variants and Enhancements

🧬 Less Abstract Reasoning

Builds each answer step-by-step, feeding previous outputs into the next query.

🔀 Parallel Fan-out Retrieval

Generates multiple interpretations of the query in parallel and merges results (used in RRF).

🧪 HyDE (Hypothetical Document Embedding) [Will cover in latter blog]

Uses an LLM to generate a hypothetical answer to embed and retrieve with.

🔀 Combine with RRF

Use Reciprocal Rank Fusion to rank sub-query retrievals and merge context better.

📌 When Should You Use Query Decomposition?

Use it when:

Queries are compound, open-ended, or vague
Information is scattered across multiple docs
You're building research assistants, multi-hop QA systems, or agentic pipelines

🧠 Final Thoughts

Query Decomposition isn't just a trick — it's a fundamental shift in how we make LLMs smarter with external knowledge. It allows retrieval to match the reasoning power of generation, creating a more reliable and scalable RAG experience.

If you're building serious GenAI apps — think beyond embeddings. Think decomposition, reasoning, and chaining.

🧩 Unlocking Smarter Retrieval: Query Decomposition in RAG

Table of contents