It is difficult for the user to give the exact prompt to the LLM that the user is thinking. In this article, we will discuss another technique for improving the retrieval process of the LLM chain, ie, Multi-Query Retrieval. We will dive only into the retrieval part as we have discussed indexing and generating responses in the previous articles, Introduction to RAG and HyDE (No Data No Problem).

What is Multi-Query Retrieval?

Mult-Query Retrieval is an advanced retrieval method in which a user query is transformed into multiple queries with different perspectives to retrieve the relevant documents.

Example:

Suppose the user query is:

What is fs module?

The user query is transformed into multiple queries of different variations:

What functionalities does the fs module provide?
Can you explain the purpose of the fs module?
What are the main features of the fs module?
How does the fs module work in programming?
What is the role of the fs module in file handling?

Each of these queries will be used to retrieve documents, and the results will be combined to provide a complete picture of the fs module.

When to use?

This method is particularly suitable where:

User queries are unclear or ambiguous.
Expansion of the search space is required.
A single query cannot cover complete information.
Understanding user intent from multiple angles is necessary.

Generating and using multiple queries improves the likelihood of finding relevant information, reduces reliance on specific phrasing, and can yield more comprehensive results.

Working

The following are the steps in generating output using multi-query retrieval:

Getting the user prompt.
Generation of multiple queries using LLM and user prompt
Retrieval of relevant documents for each query.
Union on the relevant documents to get all the relevant documents.
Generation of the Final Output by LLM using relevant documents and user prompt.

Code Implementation

Python, Langchain, Qdrant, and OpenAI were used for the following code. In code implementation, as mentioned earlier, our primary focus would be on the retrieval part.

In Langchain’s chain, the output from a component of the chain should be compatible with the next component.

First of all, a prompt is written that instructs the LLM to generate multiple queries from a given query.

self.multi_query_prompt="""
            You are an AI language model assistant.
            Your task is create five versions of the user's question to fetch documents from a vector database.
            By offering multiple perspectives on the user's question, your goal is to assist the user in overcoming some of the restrictions of distance-based similarity search.
            Give these alternative questions, each on a new line.
            Question: {question}
            Output:
        """

The prompt is converted into a suitable prompt template format to work in Langchain’s chain.

multi_query_prompt_template = ChatPromptTemplate.from_template(self.multi_query_prompt)

Here is a get_unique_documents function that performs a union of the documents. The dumps are used to convert the Document type to a string, and loads are used to convert the string back to the Document format. Here, dumps and loads are used to avoid errors if Document objects are unhashable and not directly comparable.

    def get_unique_documents(self, documents):
        flattened_docs = []
        for sublist in documents:
            for docs in sublist:
                flattened_docs.append(dumps(docs))

        unique_docs = list(set(flattened_docs))

        relevant_docs = []
        for docs in unique_docs:
            relevant_docs.append(loads(docs))

        return relevant_docs

A retrieval chain is made. From langchain_openai, ChatOpenAI is used for LLM, and a vector store is made using Qdrant. StrOutputParser converts the LLM output into a suitable string format, followed by a function to list all the questions. For each question, relevant documents are retrieved. Finally, the union of all the documents is performed.

retrieval_chain = (
            multi_query_prompt_template
            | llm
            | StrOutputParser()
            | (lambda x: [i for i in x.split("\n") if x!=''])
            | retriever.map()
            | self.get_unique_documents
        )

The retrieval chain is invoked, and the relevant documents are retrieved.

relevant_docs = retrieval_chain.invoke(
            {"question": user_prompt}
        )

The relevant documents are returned to the main function.

return relevant_docs

Here is the complete code for retrieval:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.load import loads, dumps

class Multi_Query_Retrieval:
    def __init__(self) -> None:
        self.multi_query_prompt="""
            You are an AI language model assistant.
            Your task is create five versions of the user's question to fetch documents from a vector database.
            By offering multiple perspectives on the user's question, your goal is to assist the user in overcoming some of the restrictions of distance-based similarity search.
            Give these alternative questions, each on a new line.
            Question: {question}
            Output:
        """

    def get_unique_documents(self, documents):
        flattened_docs = []
        for sublist in documents:
            for docs in sublist:
                flattened_docs.append(dumps(docs))

        unique_docs = list(set(flattened_docs))

        relevant_docs = []
        for docs in unique_docs:
            relevant_docs.append(loads(docs))

        return relevant_docs

    def get_revlevant_docs(self, llm, retriever, user_prompt):
        multi_query_prompt_template = ChatPromptTemplate.from_template(self.multi_query_prompt)

        retrieval_chain = (
            multi_query_prompt_template
            | llm
            | StrOutputParser()
            | (lambda x: [i for i in x.split("\n") if x!=''])
            | retriever.map()
            | self.get_unique_documents

        )

        relevant_docs = retrieval_chain.invoke(
            {"question": user_prompt}
        )

        return relevant_docs

Output:

The fs module in Node.js is a built-in core module that provides functions to work with the file system. It allows you to perform operations such as reading, writing, updating, and deleting files on your computer.

For example, you can use the fs module to write data to a file like this:

```javascript
const fs = require('fs')

fs.writeFileSync('notes.txt', 'I live in Philadelphia')
```

In this example, the script uses the `writeFileSync` function from the fs module to create (or overwrite) a file named "notes.txt" and write the string "I live in Philadelphia" into it. After running the script, you will see the new "notes.txt" file in your directory containing that message.

The fs module is essential for any Node.js application that needs to interact with the file system.

Conclusion

In this article, we discussed the mult-query retrieval, its use case, its working, and code implementation. You can get this code through GitHub.

Multi-Query Retrieval harnesses the power of diversity by retrieving multiple sets of documents based on varied interpretations of the original query. This technique can be further modified according to the use case. One such extension of multi-query retrieval is RAG fusion, which will be covered in a future article.

Multi-Query Retrieval

Table of contents