Query Translation (Advance Rag)

Abdullah FarhanAbdullah Farhan
5 min read

What is RAG?

RAG (Retrieval-Augmented Generation) is a framework that improves large language model responses by grounding them in external knowledge retrieved from a document corpus. It's typically broken down into three main stages:


🧱 RAG Pipeline Overview

          +----------------------+
          |   Original Document  |
          +----------------------+
                    |
                    v
         +-------------------------+
         | Chunk into smaller parts|
         +-------------------------+
                    |
                    v
         +-------------------------+
         |   Embed chunks (vectors)|
         +-------------------------+
                    |
                    v
         +-------------------------+
         | Store in a Vector Store |
         +-------------------------+

💡 At Query Time

+----------------------+          +---------------------------+
|     User Query       | ----->   |   Query Embedding         |
+----------------------+          +---------------------------+
                                             |
                                             v
                                +---------------------------+
                                | Vector DB Similarity Search|
                                +---------------------------+
                                             |
                                             v
+-----------------------------+     +------------------------------+
| Relevant Chunks Retrieved   | --> | Prompt + Chunks → LLM Input  |
+-----------------------------+     +------------------------------+
                                                  |
                                                  v
                                    +------------------------------+
                                    |       Final Response         |
                                    +------------------------------+

The Problem

While RAG is powerful, it can fail in real-world use cases when:

  • The user query is vague or abstract.

  • The corpus is sparse or incomplete.

  • The retrieved chunks aren’t truly relevant.

Garbage in → Garbage out
If the query is ambiguous, the result will be ambiguous too.


Advanced RAG Techniques

To solve these problems, Advanced RAG introduces enhancements in query handling and retrieval strategies.


1. Query Translation

User queries usually lie on a spectrum:

TypeExample
Too Abstract"Tell me about cloud reliability"
Too Specific"Explain SLOs in a Spring Boot + Prometheus setup"

The Fix: Rewrite the Query

---------------+       +------------------------+
| User Prompt   | --->  | LLM Rewrites Query     |
+---------------+       +------------------------+
                                |
                                v
              +--------------------------+
              | Multiple Refined Queries |
              +--------------------------+

Each refined query is run in parallel, generating multiple retrievals.


2. RAG Fusion (a.k.a. Multi-Query Retrieval)

This approach involves:

  1. Generating multiple queries (e.g., using LLM).

  2. Retrieving results for each.

  3. Combining or intersecting results.

But intersection is not always optimal. So instead…


3. Reciprocal Rank Fusion (RRF)

Instead of filtering or intersecting results from each query, RRF assigns scores based on how highly a document appears across multiple ranked lists.

RRF Formula

For each ranked list:

score[doc_id] += 1 / (k + rank + 1)

Where:

  • rank is the position of the document in the list.

  • k is a tunable constant (commonly 60).

Example Code:

pythonCopyEditdef reciprocal_rank_fusion(rankings, k=60):
    from collections import defaultdict
    scores = defaultdict(float)

    for ranking in rankings:
        for rank, doc_id in enumerate(ranking):
            scores[doc_id] += 1 / (k + rank + 1)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

Sample Input

pythonCopyEditrankings = [
    ["doc1", "doc2", "doc3"],
    ["doc3", "doc2", "doc5"],
    ["doc2", "doc3", "doc7"]
]

Output:

Docs like doc2 and doc3 will rank higher since they appear more frequently and higher in the lists.


4. Query Decomposition

Break down the user query into subtasks or steps, and answer each part to improve accuracy and completeness.

GoalTechniqueTool
More abstract queryStep-Back PromptingLLM
More specific queryChain-of-ThoughtLLM

Example:

User query: "How to optimize ML training time?"

LLM breaks it down:

  • Step 1: Identify bottlenecks

  • Step 2: Choose better hardware

  • Step 3: Use efficient models or frameworks

All the above steps + original query are passed to the LLM for a more guided and informative response.

Diagram:

User Query
   |
   v
+------------------+
|  Decomposer      |
+------------------+
   |      |      |
   v      v      v
SubQ1   SubQ2   SubQ3     (Sub-Queries)
   |      |      |
   v      v      v
 LLM    LLM    LLM        (Can be same or different LLMs)
   |      |      |
   +------|------+
          v
   +------------------+
   | Aggregator       |  (Combines sub-results)
   +------------------+
          |
          v
     Final Answer

Step Back Prompting

What is it?
Step Back Prompting is a technique used in Advanced RAG (Retrieval-Augmented Generation) where we intentionally abstract a specific query to a more general form. This helps increase the search space, allowing the retrieval component to surface more relevant and contextually rich chunks. It improves the chances of hitting more accurate or meaningful responses from the LLM.


Why do it?
Sometimes, user queries are too specific or narrow, making it hard for the retriever to fetch meaningful results. By rephrasing or “stepping back,” we generalize the question just enough to bring in a broader range of relevant information.


Example 1
Original query:

"When was the last time a team from Canada won the Stanley Cup in 2002?"

This is quite narrow and might not match well with documents in the vector database.

Step Back version:

"Which years did a team from Canada win the Stanley Cup up to 2002?"

Now the question is broader and increases the likelihood of retrieving useful historical data, which can then be filtered or refined by the LLM to answer the original query.

Example 2
Original query:

"Where was person X born?"

Step Back version:

"What is the personal background or early life of person X?"

This generalization allows the retrieval engine to capture not just the birthplace but other potentially relevant biographical context, improving the final generation.

Hypothetical Document Embeddings (HyDE)

What is it?
Hypothetical Document Embeddings (HyDE) is an advanced retrieval technique primarily applicable with larger language models. Instead of directly embedding the user's original query, we first ask the LLM to generate a hypothetical answer or document based on the query. We then embed this generated text—not the original query—and use it to perform vector search.


How it works:

  1. User Query →
    "What are the key principles of quantum computing?"

  2. LLM Generates a Hypothetical Answer →

    "Quantum computing leverages principles like superposition, entanglement, and quantum interference. It differs from classical computing..."

  3. Embedding the Generated Text →
    This richer and more detailed document is turned into an embedding vector.

  4. Vector Search →
    That vector is used to retrieve similar documents from the vector database.

  5. Final Generation →
    Retrieved chunks + original user query are passed to the LLM for the final response.


Why use it?

  • Better retrieval performance for vague or complex queries.

  • Richer semantic representation compared to raw query embedding.

  • Useful when the user’s query is too short, abstract, or ambiguous.

0
Subscribe to my newsletter

Read articles from Abdullah Farhan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abdullah Farhan
Abdullah Farhan