Supercharging RAG with Query Transformation and Parallel Query (Fan Out) Retrieval

Akshay KumarAkshay Kumar
4 min read

Retrieval-Augmented Generation (RAG) has become a powerful strategy to enhance the capabilities of large language models (LLMs). But while RAG solves a lot of problems around hallucination and grounding, it's not without its own limitations.

In this post, we’ll explore:

  • 🌱 What is RAG and its limitations?

  • 🔧 Advanced techniques to make RAG smarter

  • 🧠 Deep dive into Query Transformation

  • 🌐 How Parallel Query (Fan Out) Retrieval solves real-world search issues


🌱 What is RAG?

At its core, RAG combines two key components:

  1. Retrieval – fetching relevant information from external knowledge (like vector databases or documents).

  2. Generation – using LLMs to answer queries based on the retrieved content.

This setup grounds the model in factual, external data and allows dynamic interaction with knowledge sources like PDFs, websites, or documentation.


⚠️ But RAG Has Some Limitations

Despite its promise, basic RAG often struggles with:

  • Query mismatch – Users may phrase a query in a way that doesn't match document wording.

  • Under-retrieval – Top-k retrievers might miss semantically relevant chunks.

  • Ambiguity – Single queries may not capture all possible wordings or intents.

Take this example:

❓ User Query: "What is fs module in Node.js?"

If the documentation mostly uses “file system module” and not “fs,” your system may miss the most relevant chunks.


🧪 Enter Advanced RAG Techniques

To address these issues, modern RAG pipelines introduce smarter steps:

  1. Query Transformation – Enhance the original user query with richer keywords or alternative phrasings.

  2. Routing – Send queries to appropriate sources or index slices.

  3. Query Constriction – Make long or vague queries more precise.

  4. Indexing – Use hierarchical, hybrid, or topic-aware indexes.

  5. Retrieval – Improve top-k accuracy using multi-vector or ensemble methods.

  6. Generation – Combine retrieved data contextually to create accurate and coherent output.

Let’s focus on the first (and arguably most underrated) step: Query Transformation.


🧠 Query Transformation: Why and How?

If system prompts don’t work well for guiding retrieval, why are we still using LLMs to re-write user queries?

The answer is: we don’t use LLMs to generate system prompts — we use them to refine user prompts.

The goal is not to guide the model’s behavior, but to generate search-optimized variants of the user’s query.

📌 Example

Let’s revisit our Node.js example:

User Query: "What is fs module?"

A basic vector search may not yield much. But transforming this into variations like:

  • "File system module"

  • "Node.js fs module documentation"

  • "File system API in Node"

… dramatically improves the chance of retrieving semantically rich and relevant chunks.


🔄 Parallel Query (Fan Out) Retrieval: Solving the Coverage Problem

Here’s where Parallel Query (Fan Out) Retrieval shines.

Instead of relying on a single query, it creates multiple query variants in parallel using LLMs. Each variant goes through the retrieval pipeline independently.

🔍 Diagram Breakdown

Let’s walk through the process:

  1. User Query ➝ Input to LLM.

  2. LLM generates multiple semantically relevant queries.

  3. Each query is embedded separately and sent to the vector store.

  4. Each returns a list of retrieved chunks.

  5. The results are merged and deduplicated using a filter_unique step.

  6. All merged chunks + original query are sent to the LLM for final answer generation.

💡 Why it works

This “fan out” method helps ensure:

  • Diverse coverage of semantically relevant documents

  • Reduction in recall errors from narrowly worded queries

  • More robust context for generation


🧵 Summary

StepRole
Query TransformationRewriting user queries with better, document-aligned language
Fan Out RetrievalRunning multiple variants in parallel to improve recall
Filter + MergeDeduplicating overlapping results and preserving unique information
GenerationUsing LLM to synthesize final response from rich context

✨ Final Thoughts

RAG is only as good as the documents it retrieves — and the documents it retrieves depend heavily on the quality of the query.

With Query Transformation and Parallel Query Retrieval, you're no longer bound to a single shot at relevance. You're fanning out and ensuring your model works with the best possible context.

Ready to level up your RAG pipeline? Start transforming your queries — literally!

1
Subscribe to my newsletter

Read articles from Akshay Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akshay Kumar
Akshay Kumar