Query Transformation with Chain Of Thought

📖Introduction

This article is part of the Advance RAG Series , an article series where various tenets and features of Advance RAG Systems are explained with visuals and code. In this article, you will understand a method/technique of Query Translation which is called Chain of Thought , it is also called Less Abstract Query Translation Technique.

🔍What is Query Translation ?

As explained in the RAG blog that a RAG system basically works in three steps : Indexing of the knowledge(external) data, Retrieval of the relevant chunks for answering the user query & Generation of the response to the user’s prompt based on the context feeded to the LLM which contains the output of the Retrieval step.

Now an Advance RAG system seeks to optimize the entire RAG pipeline to give the optimum response to the user's query. Query Translation is a tenet of Advance RAG system, for more details about Advance RAG systems and its various tenets check out this blog here, which aims to translate the user’s query such that it fetches better results during Retrieval step and as a result Generates a better response.

🔬Abstraction Level of The User Query

A user query can be ranked on the scale of Abstraction, and based on the use case the level of abstraction can be increased or decreased for better response to the user’s original query.

As can be seen in the above image :

In the more abstracted version; the query is 🌨️goal-focused, ☀️open-ended and, 🧩requires understanding context or intent.
In the less abstracted version; the query is 🔧direct and technical, 🎯task-specific and, 🗂️requires minimal interpretation.

Now Chain of Thought is the Less Abstract Way of Query Translation. And Step-Back Prompting is the More Abstract Way of Query Translation, to understand in detail about Step-Back Prompting read this article here.

🧠Chain of Thought Query Translation Technique Explained

Chain of Thought breaks down the user query into multiple subqueries / subproblems, converts them into vector embeddings, performs the similarity search for the first subquery, gives its output as context to the next subquery (this continues until the last subquery) and then the final output along with the original user query is feeded as context to the LLM for response. And as can be inferred that for efficient CoT a large context-window LLM is required.

Credits : Nakul Srivastav

⚡Effect of CoT Query Translation Technique

Since the user query is broken down hence more relevant context is provided as a result of deep diving in the user query by making it Less Abstract, this augmented context can then give more precise and efficient response to the user’s query.

📊💻Step By Step Working Through Diagram & Code

Credits : Nakul Srivastav

From the user’s query, LLM breaks it down to simple Sub-Queries (3 here)

Then sequentially, the vector embeddings are created and similarity search performed for the first query , after getting relevant context it is provided to LLM with first query to generate response, then this response is set as context for next subquery. This process is continued until the last subquery.

After the loop ends on the last subquery, the response is generated from last subquery and is shown as Final Output to the user.

✨Chain of Thought Output

🔗Important Links

🎯Conclusion

Through this article you saw how to implement Chain of Thought Query Retrieval Technique in your RAG and make the response more efficient and optimised.

Chain Of Thought : Query Transformation Technique

Table of contents