RAG Unlocked : The Secret Sauce Behind Smarter AI

Trilochan SahooTrilochan Sahoo
5 min read

“Wherever you look, there you’ll find me.”

This quote has never been truer than in today’s world for AI. Everyone has heard about the trend of AI; its upcoming new changes are happening in all sectors, from tiny Retail shops to large healthcare systems, even in space exploration. Even in daily scenarios, we use it. Yes, we all are using ChatGPT, the most famous AI, at least once to fix text and code 😅, OpenAI, the OG👑. Among the many buzzwords in AI, one of the most common terms is RAG. So today, let’s understand what this is really in a simple and fun way.

Let’s start with a small story.

It’s story time…🏔️

Chatur and Rancho are two friends. Chatur, the front-bencher🤓, pays close attention when the teacher explains the topic before a test. Rancho, the back-bencher😎, skips the lecture and just tries to figure things out on his own. When the results come in, Chatur scores high, while Rancho fails.

Now, why this story? Think of Chatur and Rancho as two AI models. Both can generate answers, but Chatur has the advantage of listening to the latest information (like retrieval from a knowledge base), while Rancho relies only on what he already memorized.

That’s exactly where RAG comes in 🗝️. Large Language Models (LLMs) are trained on massive datasets and billions of parameters, but sometimes they generate answers that sound right without being accurate. RAG helps optimize their output by connecting them to a specific knowledge base. In simple terms, it’s like giving the model a textbook before the test, so its answers stay relevant, accurate, and useful.

Why is it needed????🤔

Without RAG, the first issue is

  • They often give outdated or overly generic answers when the user expects a specific and current response.

  • It might present false or off-topic information when it doesn’t know.

  • Sometimes the response comes from non-authoritative or unreliable sources.

  • There is a huge chance of creating inaccurate responses due to terminology confusion, where different training sources use the same terminology to talk about different things.

So when RAG came into the picture with his power,💪

  • No need to train the LLM on new data; it is highly cost-effective💵.

  • Access to the latest research, statistics, and news📰.

  • Enhancing the user trust 😇 by providing valid documents and sources.

  • Fine-grained control: developers can choose which sources the LLM relies on, adapt to new requirements, and even restrict sensitive info by authorization level🔐.

Basically, if plain AI is like Iron Man 🦸‍♂️ without Jarvis, then RAG is when Jarvis connects him to the entire internet. Smarter, faster, and a lot less likely to hallucinate.

How actually it works with AI???🤔

It has two main parts: Data Preparation and Retrieval & Generation.

  1. Data Preparation

In the first part, Data Preparation comes into the picture. In the end, we need a way to feed fresh knowledge to the LLM to work. For this, we upload raw data sources. It can be anything like text, PDFs, docs, XML files, photos, video subtitles, etc. Once the data is collected, valuable information will be extracted from the raw data.

These extracted data were then converted into embeddings by chunking and stored in vector databases with proper indexing.

So much jargon🥴🥱… don’t worry, let's understand each step.

What is Chunking???

Chunking means nothing, just dividing large text into smaller pieces. But again, why? There are 2 main reasons why chunking is necessary. LLM has a limited capacity for processing text at once. By chunking, the text or data can be broken into segments that the LLM can handle.

By chunking, the large data is handled and also properly indexed so that LLM can provide more relevant and accurate responses in the context of the user’s input.

Furthermore, it isn’t enough just to chunk into a small number of datasets; the resulting chunks must contain information that is relevant to search. If the chunk contains a sort of sentence that is not useful without context, it may not be considered when searching. To fix this problem, overlapping is used in RAG to hold the complete thought or concept that might span the boundary of two chunks and that can still be understood by the RAG system.

What is Indexing RAG???

Think it is like building a super-efficient catalog 📒(Index) for a store(LLM), where instead of checking each item, it will instantly pinpoint the exact location 📍🗺️ where the item is present (the user’s most relevant answer). For this, different database systems use various algorithms to store, index, and retrieve the data at a faster speed.

  1. Retrieval & Generation

Now comes the fun part—the actual Q&A process. When we ask anything, it’s converted into an embedding. It goes to the Vector Database. In the Database, it matches our embedded query and retrieves the most relevant data chunks. Then these chunks go to LLM with the query and generate the most accurate answer, and it returns with a response.


So that’s what makes the RAG - Retrieval-Augmented Generation.

The difference between an AI that guesses… and an AI that actually knows. 😉

So that’s RAG in action, basically giving your AI a memory boost and a reality check at the same time. Next time you see an AI answering like a pro with the latest info, just know RAG is working quietly in the background like the unsung hero. 🚀

3
Subscribe to my newsletter

Read articles from Trilochan Sahoo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Trilochan Sahoo
Trilochan Sahoo