Query Transformation with Reciprocal Rank Fusion

📖Introduction

This article is a part of the Advance RAG series, a series which explains the various tenets and features of Advance RAG systems. In this article, Reciprocal Rank Fusion (RRF), a Query Transformation/Translation Technique is explained along with diagram and code. RRF works on the same principle as that of Parallel Query Retrieval the only difference between that the retrieved document of each query is ranked based on its order & frequency of occurence.

🔀What is RRF ?

In a RAG system there are 3 steps involved :

Indexing the knowledge (external) documents in the vector store in the form of vector embeddings.
Retrieving the relevant document chunks which are semantically similar to the user’s query.
Generating the response to the user’s query by the LLM based on its feeded context, which was the result of the Retrieval step.

To learn in more detail about the steps of a RAG System, see this article here.

Now in the case of Advance RAG where the Retrieval Step is targeted in RRF fashion what actually happens is that :

After the user’s query is received, then multiple version of the same query is created.
For each query the semantic search is performed in the vector store.
Based on the result of each query the relevant document chunks obtained are ranked based on their order and frequency of occurence.

⚡Effect of RRF Query Transformation

Since the user query is transformed into various versions thus increasing the number of relevant document chunks which after ranking and augmenting to the context of the LLM along with the original user prompt leads to more precise response.

📊💻Step By Step Working Through Diagram & Code

From user-prompt, LLM generates similar queries (3 queries here)

Then, vector embeddings are found and semantic search performed to get relevant data.

Next, the relevant data is filtered out to get only unique data which is then ranked and prioritized to get the top n documents (here n=4) out of all the fetched documents.