π§© Unlocking Smarter Retrieval: Query Decomposition in RAG

Table of contents

π Introduction
Retrieval-Augmented Generation (RAG) has transformed how Large Language Models (LLMs) access and use external knowledge. But a single query β especially if it's vague or multi-step β often isn't enough to retrieve the most relevant information.
Thatβs where Query Decomposition comes in. Itβs a powerful enhancement to RAG pipelines that breaks down complex questions into smaller, more precise sub-queries.
In this post, weβll explore:
What query decomposition is
Why it matters in real-world applications
How it works (with diagrams and code)
Variants like Less Abstract reasoning and Sequential Answer Composition
How it's different from methods like HyDE and Parallel Fan-out Retrieval
π€― The Problem: RAG Falls Short on Complex Queries
In standard RAG:
plaintextCopy codeUser Query β Embed β Retrieve β Answer
This works well when:
The query is precise or we can say more abstract
The documents match the keywords very closely
But it fails when:
The query is multi-hop (requires reasoning over multiple facts)
The query is ambiguous or too short
Important context is spread across multiple documents
π§ Solution: Query Decomposition
Query Decomposition involves using an LLM to break the user's question into smaller, logically simpler sub-questions, then processing each one individually, and finally aggregating their answers.
π General Workflow:
plaintextCopy codeUser Query
β
LLM β [SubQ1, SubQ2, SubQ3]
β
SubQ1 β Retrieve β Answer1
SubQ2 + Answer1 β Retrieve β Answer2
SubQ3 + Answer1 + Answer2 β Retrieve β Answer3
β
LLM(User Query + Answer1 + Answer2 + Answer3) β Final Answer
This is often called "Less Abstract" Query Decomposition, because it promotes step-by-step reasoning.
π Example from White Paper
β User Query:
"What is machine learning ?"
π§ Decomposed:
What is machine ?
What is learning ?
What is machine learning ?
Each of these can be retrieved + answered individually.
βοΈ Code Snippet (LangChain Style)
pythonCopy codefrom langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
# Step 1: Decompose
prompt = PromptTemplate.from_template("""
Break the user's question into simpler sub-questions that can be answered independently.
Question: {question}
""")
llm = OpenAI()
decompose_chain = LLMChain(prompt=prompt, llm=llm)
sub_questions = decompose_chain.run("What is machine Learning?")
You can then loop through
sub_questions
, retrieve relevant chunks for each, and generate individual answers using RAG.
π Workflow of Less Abstract Query Decomposition
π§ Step 1: Decompose User Query
Use an LLM to break a complex query into smaller logical steps:
cssCopy codeUser Query β ["SubQ1", "SubQ2", "SubQ3"]
π Step 2: Answer in Stages
You iterate through each sub-query one at a time:
2.1 For SubQ1:
π Retrieve relevant chunks for SubQ1
π‘ LLM:
Answer1 = LLM(SubQ1 + Context1)
2.2 For SubQ2:
π Retrieve for SubQ2
π‘ LLM:
Answer2 = LLM(SubQ2 + Context2 + Answer1)
2.3 For SubQ3:
π Retrieve for SubQ3
π‘ LLM:
Answer3 = LLM(SubQ3 + Context3 + Answer1 + Answer2)
π― Step 3: Final Answer
Finally, aggregate all steps:
javaCopy codeFinal Answer = LLM(UserQuery + Answer1 + Answer2 + Answer3)
π§ Advantages
β
Handles multi-hop and compound questions
β
More focused retrieval, less noise
β
Enables sequential reasoning
β
Works well with agent-style systems and intermediate reasoning chains
π Variants and Enhancements
𧬠Less Abstract Reasoning
Builds each answer step-by-step, feeding previous outputs into the next query.
π Parallel Fan-out Retrieval
Generates multiple interpretations of the query in parallel and merges results (used in RRF).
π§ͺ HyDE (Hypothetical Document Embedding) [Will cover in latter blog]
Uses an LLM to generate a hypothetical answer to embed and retrieve with.
π Combine with RRF
Use Reciprocal Rank Fusion to rank sub-query retrievals and merge context better.
π When Should You Use Query Decomposition?
Use it when:
Queries are compound, open-ended, or vague
Information is scattered across multiple docs
You're building research assistants, multi-hop QA systems, or agentic pipelines
π§ Final Thoughts
Query Decomposition isn't just a trick β it's a fundamental shift in how we make LLMs smarter with external knowledge. It allows retrieval to match the reasoning power of generation, creating a more reliable and scalable RAG experience.
If you're building serious GenAI apps β think beyond embeddings. Think decomposition, reasoning, and chaining.
Subscribe to my newsletter
Read articles from Akshay Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
