Advanced RAG: Routing Strategies

In the previous article in this series, we explored various query translation patterns within a Retrieval-Augmented Generation (RAG) system. We delved into how user queries are transformed to suit the specific requirements of the system. Following the translation of a user query, the next crucial step is the concept of routing. Routing involves directing the program flow to the appropriate data store based on the translated query. This process ensures that the system efficiently accesses the correct data source, which is essential for retrieving accurate and relevant information. By effectively routing queries, the system can optimize performance and enhance the overall user experience by delivering precise results promptly.

IMPORTANCE OF ROUTING

Businesses have a vast amount of data scattered across different places, like financial data, employee information, and technical documents, which cannot follow the logic of a simple RAG application based on a PDF file. This data needs to be identified, have embeddings created from it, and be stored in separate vector databases so that our AI application can use it to respond to user queries. When stored this way, the data size can reach terabytes, making it impractical and wasteful to search entire databases for all user queries. Therefore, we need a mechanism to intelligently identify the section of the data source that potentially contains the answer the user is looking for, which is achieved through routing. PerplexityAI is the only profitable AI company that, interestingly, doesn't have a model; they've implemented a RAG application with excellent query translation and query routing, optimizing over time to provide accurate responses. There are rumors that GPT-5 LLM from OpenAI will be a router.

TYPES OF ROUTING

There are two primary types of routing that are essential for efficiently managing and accessing large datasets:

LOGICAL ROUTING: This type of routing operates based on specific conditions, similar to how an if...else statement functions in a programming language. It involves directing the flow of data or queries based on predefined rules or conditions. For example, if a user query pertains to financial data, the system will route the query to the financial data store. This ensures that queries are handled appropriately and efficiently, reducing unnecessary processing and improving response times.
SEMANTIC ROUTING: Unlike logical routing, semantic routing focuses on the meaning and context of the user's query. It involves analyzing the content of the query to determine its intent and then directing it to the appropriate data source. This type of routing is crucial when dealing with complex queries that may not fit neatly into predefined categories. By understanding the semantics of the query, the system can provide more accurate and relevant responses, enhancing the overall user experience.

For both logical and semantic routing to function effectively, it is crucial to have data that is well-organized, properly segregated, and accurately labeled. This organization allows the routing mechanisms to quickly and accurately identify the relevant data sources, ensuring that user queries are answered efficiently and correctly.

LOGICAL ROUTING

Assume we have a set of PDF files, website links, and a database as our data sources, which store information about creating programs using JavaScript and Python. Now, imagine a user asks the application,

"how to add two numbers in Python?"

Do we need to consider the JavaScript section of our data source? Semantically, the LLM might find some useful information there, but it makes more sense to focus on the Python section to retrieve the relevant information. This also highlights the importance of classifying and storing data with correct labels in a vector database, like Qdrant or PineCone, which is crucial for creating a well-functioning AI application, as routing relies on it. If the data is correctly classified and stored, we can use an SLM or Small Language Model for routing.

SEMANTIC ROUTING

In semantic routing, the process involves working with prebuilt query templates to efficiently handle user queries. How it works is when a query is received, it is first processed by a Language Learning Model (LLM). The LLM analyzes the query to understand its context and intent. Based on this analysis, the LLM selects the most suitable template from a set of predefined templates that best matches the nature of the query. Once the appropriate template is chosen, the query is updated and refined according to the specific requirements of the template. This updated prompt is then fed back into the LLM for further processing.

This method ensures that the query is handled in a way that maximizes relevance and accuracy, providing the user with the most precise and useful information possible. By leveraging semantic routing, we can enhance the efficiency and effectiveness of information retrieval in AI applications.

USE OF KNOWLEDGE GRAPH DATABASE

Introducing a knowledge graph database into a RAG is crucial for routing, especially in managing complex data relationships. While vector databases are useful for semantic cases, they can lose relational data, which is essential for understanding connections. Graph databases excel in this area, with Neo4J being a top choice. It supports graph functionalities and integrates vector embeddings, offering a comprehensive solution. Neo4J uses Cypher queries, similar to SQL, for efficient data handling. For optimal LLM results, vector embeddings and data relationships are needed, improving AI application performance and accuracy. We will take an in-depth look at knowledge graph databases in the next article.

REFERENCE

GenAI cohort by Hitesh Chaudhary and Piyush Garg

#7 - Advanced RAG: Routing

Table of contents