Since pre-trained Large Language Models (LLMs) have a knowledge cutoff and can't access real-time or updated information, Retrieval-Augmented Generation (RAG) helps improve their accuracy by integrating external knowledge sources. This allows the LLM to generate more relevant and up-to-date responses.

Why We Don’t Feed All the Data to LLMs

LLMs (Large Language Models) have a limited context window, just like humans can only remember so much at one time.

If you try to give the entire data source (like a long PDF or a large document) to the LLM, it might not process it effectively. It could even become irrelevant or confusing due to the size limit of the context window.

Instead, a better approach is to:

Analyze the user's query
Retrieve only the most relevant chunks of data related to that query
Insert just those relevant parts into the system prompt
Let the LLM respond based on that focused context

This method not only gives more accurate and relevant answers, but also does it faster and more efficiently.

Example:

Imagine you have a large data source, like a PDF file.

At first, you might think of converting the PDF into plain text and putting it all into the system prompt of a Large Language Model (LLM), so it can answer questions based on that content.

But if the PDF is too large, it's not practical to include the entire content in the prompt. That’s where RAG (Retrieval-Augmented Generation) comes in.

How RAG Works (Step-by-Step):

Data Source - You start with a PDF containing a large number of pages.
Indexing - Index the data, convert the pdf into pages.
Vector Embeddings - Create the semantic meaning of text in indexed pages of pdf, and stores those embeddings into vector database via aligning with page numbers.
Process User Query - Generate the vector embeddings of user query.
Similarity Search - Match the stored page embeddings with user query’s embeddings, and extract only those pages from datasource those having most similar semantic meaning.
LLM Query - Now, Insert the extracted pages data to system_prompt, and resolve the user queries through that data.

GitHub Link: A Basic RAG Functionality Code

Intro to RAG (Retrieval Augmented Generation)

Why We Don’t Feed All the Data to LLMs

Example:

How RAG Works (Step-by-Step):

Subscribe to my newsletter

Shikha Lodhi

Shikha Lodhi