🧩 Unlocking Smarter Retrieval: Query Decomposition in RAG

Akshay KumarAkshay Kumar
4 min read

πŸš€ Introduction

Retrieval-Augmented Generation (RAG) has transformed how Large Language Models (LLMs) access and use external knowledge. But a single query β€” especially if it's vague or multi-step β€” often isn't enough to retrieve the most relevant information.

That’s where Query Decomposition comes in. It’s a powerful enhancement to RAG pipelines that breaks down complex questions into smaller, more precise sub-queries.

In this post, we’ll explore:

  • What query decomposition is

  • Why it matters in real-world applications

  • How it works (with diagrams and code)

  • Variants like Less Abstract reasoning and Sequential Answer Composition

  • How it's different from methods like HyDE and Parallel Fan-out Retrieval


🀯 The Problem: RAG Falls Short on Complex Queries

In standard RAG:

plaintextCopy codeUser Query β†’ Embed β†’ Retrieve β†’ Answer

This works well when:

  • The query is precise or we can say more abstract

  • The documents match the keywords very closely

But it fails when:

  • The query is multi-hop (requires reasoning over multiple facts)

  • The query is ambiguous or too short

  • Important context is spread across multiple documents


🧠 Solution: Query Decomposition

Query Decomposition involves using an LLM to break the user's question into smaller, logically simpler sub-questions, then processing each one individually, and finally aggregating their answers.

πŸ” General Workflow:

plaintextCopy codeUser Query
   ↓
LLM β†’ [SubQ1, SubQ2, SubQ3]
   ↓
SubQ1 β†’ Retrieve β†’ Answer1
SubQ2 + Answer1 β†’ Retrieve β†’ Answer2
SubQ3 + Answer1 + Answer2 β†’ Retrieve β†’ Answer3
   ↓
LLM(User Query + Answer1 + Answer2 + Answer3) β†’ Final Answer

This is often called "Less Abstract" Query Decomposition, because it promotes step-by-step reasoning.


πŸ” Example from White Paper

❓ User Query:

"What is machine learning ?"

🧠 Decomposed:

  1. What is machine ?

  2. What is learning ?

  3. What is machine learning ?

Each of these can be retrieved + answered individually.


βš™οΈ Code Snippet (LangChain Style)

pythonCopy codefrom langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# Step 1: Decompose
prompt = PromptTemplate.from_template("""
Break the user's question into simpler sub-questions that can be answered independently.

Question: {question}
""")
llm = OpenAI()
decompose_chain = LLMChain(prompt=prompt, llm=llm)

sub_questions = decompose_chain.run("What is machine Learning?")

You can then loop through sub_questions, retrieve relevant chunks for each, and generate individual answers using RAG.


πŸ” Workflow of Less Abstract Query Decomposition

🧠 Step 1: Decompose User Query

Use an LLM to break a complex query into smaller logical steps:

cssCopy codeUser Query β†’ ["SubQ1", "SubQ2", "SubQ3"]

πŸ” Step 2: Answer in Stages

You iterate through each sub-query one at a time:

2.1 For SubQ1:

  • πŸ” Retrieve relevant chunks for SubQ1

  • πŸ’‘ LLM: Answer1 = LLM(SubQ1 + Context1)

2.2 For SubQ2:

  • πŸ” Retrieve for SubQ2

  • πŸ’‘ LLM: Answer2 = LLM(SubQ2 + Context2 + Answer1)

2.3 For SubQ3:

  • πŸ” Retrieve for SubQ3

  • πŸ’‘ LLM: Answer3 = LLM(SubQ3 + Context3 + Answer1 + Answer2)


🎯 Step 3: Final Answer

Finally, aggregate all steps:

javaCopy codeFinal Answer = LLM(UserQuery + Answer1 + Answer2 + Answer3)


🧠 Advantages

βœ… Handles multi-hop and compound questions
βœ… More focused retrieval, less noise
βœ… Enables sequential reasoning
βœ… Works well with agent-style systems and intermediate reasoning chains


πŸ” Variants and Enhancements

🧬 Less Abstract Reasoning

Builds each answer step-by-step, feeding previous outputs into the next query.

πŸ”€ Parallel Fan-out Retrieval

Generates multiple interpretations of the query in parallel and merges results (used in RRF).

πŸ§ͺ HyDE (Hypothetical Document Embedding) [Will cover in latter blog]

Uses an LLM to generate a hypothetical answer to embed and retrieve with.

πŸ”€ Combine with RRF

Use Reciprocal Rank Fusion to rank sub-query retrievals and merge context better.


πŸ“Œ When Should You Use Query Decomposition?

Use it when:

  • Queries are compound, open-ended, or vague

  • Information is scattered across multiple docs

  • You're building research assistants, multi-hop QA systems, or agentic pipelines


🧠 Final Thoughts

Query Decomposition isn't just a trick β€” it's a fundamental shift in how we make LLMs smarter with external knowledge. It allows retrieval to match the reasoning power of generation, creating a more reliable and scalable RAG experience.

If you're building serious GenAI apps β€” think beyond embeddings. Think decomposition, reasoning, and chaining.

0
Subscribe to my newsletter

Read articles from Akshay Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akshay Kumar
Akshay Kumar