In this module, we will study about reciprocal rank fusion technique for Query transformation

Reciprocal Rank Fusion

RAG’s most important step is Retrieval stage which can make or break the RAG pipeline. So techniques like Parallel Query, Reciprocal rank fusion, Query decomposition, if implemented properly, then give wonderful results.

Crux of Reciprocal Rank Fusion.

The main crux of reciprocal rank fusion is to rank documents based on its occurrence, and give rank after applying the above formula.

Let us discuss the Roadmap of this algorithm.

First, User provides a query which is then sent to different retriever models. Each retriever will give its own ranking of chunks or documents.
The Rankings from all retrievers will be combined using RRF formula mentioned above and a unified ranking is produced based on the RRF score.
The generative model or LLM uses the top rank documents to generate answer.

Discussion on Formula

Using 1/(rank + k) provides a proper distinguishing factor to segregate documents who are higher ranked to those who are lower ranked. For example, difference between rank 1 and rank 2 will be more significant then difference between rank 50 and rank 51.

For rank 1 : 1/(1+60) ≈ 0.0164

For rank 2 : 1/(2+60) ≈ 0.0161

For rank 50 : 1/(50+60) ≈ 0.0091

For rank 51 : 1/(51+60) ≈ 0.0091

This ensures that the document in upper rankings of majority of retrievers are favored in final ranking. This technique also ensures the idea of Diminishing returns, i.e. importance of documents whose score is less decreases non linearly as rank increases.

Optimum value of k

One of the major concern is the value of constant k, which is generally taken 60, This value is not fixed for the formula, but is generally taken as benchmark value as :

It is proven in researches as well that k=60 performs well across datasets.
Provides a good balance between significance of top-ranked and bottom ranked documents.
This value , in research, has been proved as more robust compared to other values.

Pseudo Code:

def reciprocal_rank_fusion(rank_list, k=60):
    """
    Combines ranked lists of documents using reciprocal rank fusion.

    Args:
        ranked_lists: A list of lists, where each inner list is a ranked list
                     of documents from a different retrieval system. Each document
                     should be uniquely identifiable (e.g., using a document ID).
        k: A constant value used in the reciprocal rank formula (default: 60).

    Returns:
        A dictionary where keys are document IDs and values are their fused scores.
    """

    final_score= {}

    for ranked_list in rank_list:
        for rank, document_id in enumerate(ranked_list):

            reciprocal_rank = 1 / (rank + k)

            if document_id not in final_score:
                fused_scores[document_id] = 0.0

            final_score[document_id] += reciprocal_rank

    return final_score

References for code taken from google, Complete code posted on my github repo.

Reciprocal Rank fusion(Advanced Rag Technique)