Components of a RAG Application

Farhan NaqviFarhan Naqvi
2 min read

RAG (Retrieval-Augmented Generation) includes three main components:

  1. Embedding Model: This model takes textual information (queries, documents, etc.) and transforms them into numerical representations called "embeddings." These embeddings capture the semantic meaning of the text in a high-dimensional space. Imagine them as unique addresses for each piece of information within a vast, multi-dimensional library.

    • Why Embeddings?

      • Efficiency: Matching text queries to documents directly can be computationally expensive. Embeddings allow for efficient similarity searches in the vector database.

      • Semantic Understanding: Embeddings go beyond simple keyword matching. They capture the underlying meaning of the text, enabling RAG to identify relevant documents even if they don't use the exact same words as the query.

  2. Vector Database: The vector database stores the embeddings generated by the embedding model. It acts like the actual library where all the information (documents, articles, etc.) resides.

    • Function of the Vector Database:

      • Retrieval: When a user enters a query, the embedding model converts it into an embedding. The vector database then searches for documents with embeddings most similar to the query embedding. This effectively retrieves the most relevant information based on semantic meaning.
  3. Large Language Model (LLM): This powerful AI model is fed with the user query and the retrieved information to generate responses.

    • How LLM uses the Retrieved Information?

      • The LLM receives the user query along with the retrieved documents (identified by the vector database). This provides the LLM with context to understand the intent behind the query.

      • With the query and relevant information, the LLM can generate a more comprehensive and informative response. It can leverage the retrieved information to provide factual grounding, answer complex questions, or complete specific tasks.


Here's an analogy:

Imagine a librarian (vector database) with a vast library organized using vector embeddings. When you ask a question (user query), the librarian quickly retrieves the most relevant books (retrieved documents) based on their content (embeddings). With these books in hand (retrieved information), a researcher (LLM) can then analyze the information and provide you with a well-supported and informative answer.


0
Subscribe to my newsletter

Read articles from Farhan Naqvi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Farhan Naqvi
Farhan Naqvi

๐Ÿš€ Passionate about AI/ML | Software Engineer | Research Enthusiast ๐Ÿš€ ๐Ÿ’ป As an Associate Software Engineer at Veritas Technologies LLC, I'm immersed in cutting-edge technologies, including C++, Elastic Stack (ELK), PostgreSQL, Docker, Kubernetes, and more. With a keen interest in AI and ML, I've delved into generative AI, machine learning, and deep learning, crafting projects that push the boundaries of innovation and efficiency. ๐Ÿ‘ฉโ€๐Ÿ’ป Additionally, I have a strong passion for research and have authored two papers on video processing during my undergrad. Currently, I'm exploring the bias in state-of-the-art LLMs, aiming to contribute to the understanding and mitigation of bias in AI systems.