Advance RAG Patterns

Ritik GuptaRitik Gupta
4 min read

Retrieval-Augmented Generation (RAG) has become one of the most popular techniques for enhancing LLMs with external knowledge. The idea is simple: retrieve relevant chunks of data from a knowledge base and feed them into the LLM along with the user’s query.

While this works well in many scenarios, basic RAG has some limitations. If the user’s query is vague, if the chunks are not properly aligned with intent, or if the retrieval ranking is weak, the final response can be inaccurate.

In this blog, we’ll explore advanced RAG patterns that tackle these issues step by step. For each method, we’ll look at:

  • The problem with basic RAG.

  • The solution introduced.

  • The limitations of the new method.

Let’s dive in!


1. Query Rewriting

The Problem:
In basic RAG, when a user query is not specific, the retriever may not fetch the right chunks. For example, if a user asks “Tell me about Apple”, should the system return results about the fruit, the tech company, or financial stock data?

The Solution:
We use Query Rewriting. The LLM rewrites the user query into a more precise, context-aware version before passing it to the retriever.

  • Example: “Tell me about Apple”“Tell me about Apple Inc. as a technology company.”

  • Once rewritten, this refined query goes through embeddings → retrieval → generation, leading to better accuracy.

The Limitation:
Query rewriting depends heavily on the LLM’s reasoning. If the model misinterprets the intent, it may rewrite incorrectly and actually harm retrieval.


2. Chunk Evaluation

The Problem:
Even with rewritten queries, not all retrieved chunks are useful. Some may be irrelevant, outdated, or redundant, leading to noisy answers.

The Solution:
We add a Chunk Evaluation Layer. After retrieving candidate chunks, another LLM (or filter) evaluates each chunk for relevance, correctness, and quality before passing it to the generator. This ensures only high-confidence chunks are used.

The Limitation:
This approach increases latency and cost, as the LLM now needs to process and evaluate multiple chunks before answering.


3. Sub-Query

The Problem:
Some queries are multi-faceted or complex. A single retrieval step may not cover all aspects.

  • Developer Use Case: Sometimes users directly paste console errors instead of framing a clean question. For instance, if the pasted error is related to Mongoose, a single raw retrieval may not be helpful. Instead, the system needs to generate multiple focused sub-queries around that error:

    • “What does this Mongoose error mean?”

    • “Common causes of this Mongoose error.”

    • “How to fix this Mongoose error in Node.js.”

This way, the retriever gathers more comprehensive and actionable chunks instead of just matching the error string literally.

The Solution:
We break down the user query into multiple sub-queries. Each sub-query retrieves relevant chunks independently. Finally, the system combines all the results into a coherent response.

The Limitation:
Sub-query decomposition may generate too many sub-queries, leading to unnecessary retrieval and higher processing overhead.



4. Ranking the Chunks

The Problem:
Even after retrieving and filtering, the order of chunks matters. Basic retrieval may rank them poorly, and the LLM might give more weight to irrelevant chunks appearing earlier.

The Solution:
We introduce Chunk Re-Ranking. An additional ranking model or LLM evaluates chunks against the query and reorders them based on relevance score. The most important chunks are prioritized for the final response.

The Limitation:
Re-ranking adds extra computation and may still struggle when multiple chunks are equally relevant but conflict in information.


5. HyDE (Hypothetical Document Embeddings)

The Problem:
What if the query is so abstract or new that the retriever struggles to find relevant matches in the knowledge base?

The Solution:
Enter HyDE (Hypothetical Document Embeddings).

  • The LLM first generates a hypothetical answer to the query (even without retrieval).

  • This hypothetical text is then embedded and used for retrieval.

  • The retriever now finds chunks similar to the generated hypothesis, which often yields richer and more contextual results.

The Limitation:
If the initial hypothetical answer is misleading or wrong, retrieval may drift in the wrong direction.


Conclusion

Basic RAG provides a foundation, but real-world applications need more robust patterns to handle vague queries, noisy chunks, complex tasks, and abstract searches.

  • Query Rewriting makes queries clearer.

  • Chunk Evaluation filters out noise.

  • Sub-Query Decomposition handles complexity.

  • Chunk Ranking improves precision.

  • HyDE expands retrieval power with imagination.

By combining these patterns, we can build smarter, more reliable RAG systems that bring us closer to production-grade AI applications.

0
Subscribe to my newsletter

Read articles from Ritik Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ritik Gupta
Ritik Gupta

🛠️ Building modern web apps with React, Node.js, MongoDB & Express