Introduction

Let's understand what RAG(Retrieval-Augmented Generation) means.

Retrieval: The primary goal of a RAG is to collect the most relevant documents/chunks regarding the query.
Augmented: Construct a well-structured prompt so that when the call is made to the LLM, it clearly understands its purpose, the Context, and how it should respond.
Generation: This is where the LLM comes into play. When the model is given good context (provided by the “Retrieval” step) and has clear instructions (provided by the “Augmented” step), it will generate high-value responses for the user.

it this article we are focusing on the “Retrieval” part techniques.

What are Query Translation Patterns?

In Retrieval-Augmented Generation (RAG), the user’s query are often needs rephrasing or expansion so that it can:

Match dense vector embeddings better
Improve semantic recall
Deal with ambiguous or under-specified queries

Query translation patterns are strategies to transform user queries into more retrieval-friendly representations.

Common Query Translation Patterns

Parallel Query Retrieval (Fan Out)
Reciprocal Rank Fusion
Step Back Prompting (Algo)
CoT - Chain Of Thought
HyDe - Hypothetical Document Embeddings

1. Parallel Query Retrieval (Fan Out)

This technique involves generating multiple variations of a user's query to increase the chances of retrieving relevant documents. By "fanning out" the query into several parallel queries, the system can explore different aspects or interpretations of the original query, thereby improving the retrieval of dense vector embeddings and enhancing semantic recall.

This approach is particularly useful for dealing with ambiguous or under-specified queries, as it broadens the search scope and increases the likelihood of finding pertinent information.

2. Reciprocal Rank Fusion

Reciprocal Rank Fusion (RRF) is a technique used to combine multiple ranked lists of documents to improve retrieval performance.

It is particularly useful in information retrieval systems where multiple sources or methods provide different rankings for the same set of documents. The RRF method assigns a score to each document based on its rank in each list, with higher-ranked documents receiving higher scores.

The formula for finding the rank of a document in Reciprocal Rank Fusion (RRF) is given by:

[ RRF(d) = \sum_{k=1}^{K} \frac{1}{rank_k(d) + \beta} ]

Where:

( RRF(d) ) is the Reciprocal Rank Fusion score for document ( d ).
( K ) is the number of different ranking lists.
( rank_k(d) ) is the rank of document ( d ) in the ( k )-th ranking list.
( \beta ) is a constant, typically set to 60, to control the influence of lower-ranked documents.

def reciprocal_rank_fusion(ranked_lists, k=60):
    scores = defaultdict(float)
    for ranked_list in ranked_lists:
        for rank, doc in enumerate(ranked_list):
            scores[doc] += 1 / (k + rank)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

3. Step Back Prompting

Step-Back Prompting was coined by research from Google DeepMind(link).

Step-Back Prompting is a prompting approach enabling LLMs to perform abstractions, derive high-level concepts & first principles from which accurate answers can be derived.

Notice also the system prompt:

You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:

Question:- GPL(Jab Gokuldham Premier League) chal raha tha, kya us waqt IPL duniya bhar mein popular tha?

in the step back

Q1. sho me gpl (gokuladham preemiyr leeg) kab dikhaee di?

GPL episodes aired around 2012–2014 in the show.

Q2. IPL kab shuroo huaa ?

IPL started in 2008

ab jab ham jante hain ki 2012-2014 ke aas-pas gpl ka prasar huaa hai, aur ipl pehle se hi 2008 se popular tha.

4. Chain of Thought (CoT) Prompting

Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.

Prompt:

think step by step to answer user's question

5. HyDE: Hypothetical Document Embeddings

Enhance the retrieval in Haystack using HyDE method by generating a mock-up hypothetical document for an initial query.

**
When Is It Helpful?**

The HyDE method is highly useful when:

The performance of the retrieval step in your pipeline is not good enough (for example, low Recall metric).
Your retrieval step has a query as input and returns documents from a larger document base.

Final Thoughts

In this blog, we explored some advanced ways to improve how queries are handled in RAG systems. These patterns help in making your AI assistant smarter by understanding user questions better and fetching more relevant information.

The main idea is simple: the better you translate and structure the query, the better your answers will be.

If you're building with RAG, try out these techniques and see how they work in your own projects. And as always, feel free to reach out if you have questions or want to geek out about this stuff — happy to chat!

Advanced RAG: Query Translation Patterns

Table of contents