Unlocking the Power of RAG: Enhancing AI with Retrieval-Augmented Generation


Terms Covered: RAG , Ingestion/ Indexing, Retrieval , Context Window , Chunking, Embeddings, Vector Embeddings, LLM Model , Vector databases, Meta-data , Prompt , LangChain, RAGTools
What is RAG ❓
Retrieval-Augmented Generation
Its basically, Retrieving (Fetching) Relevant information from some External data source and providing it as a context for LLMs to Augement ( Increase ) the model’s base Knowledge to Generate Improved Output response by LLMs .
RAG is Most Useful when you want to Response from some Information ( which it doesn’t have Access to , that can be either realtime data, your personalized Data , Data came after Model’s Knowledge Cutoff time, etc. )
Use-Cases Examples:
You want to know about React 19, and your LLM model is trained till React 18
You want to Analyze and use Data from your organization , which is stored in some kind of database.
You want to question from a big PDF File, you don’t have time to go through it completely, but you need information from it.
Why RAG ❓
For LLM models , we have two approaches to make them usable for any Business/ Application Layer:
Fine Tuning - Take some base LLM Model and train it on our data
RAG - Providing Relevant Data as Context/ Prompt to LLM Models for our use Case on the spot
Approaches | Fine Tuning | RAG |
CHANGES | we make changes in data layer , | we don’t make any changes in data, just providing different context based on user query on the Spot |
WORKING | DATA Layer | Application Layer |
PROS | Easy to do , as always fixed set of data to provide for training | Not any extra cost , realtime , Can write in Database any change , |
CONS | Expensive, Time & Resources Consuming , Not realtime . you cannot write into your database | Can be as complex as data source and size of data increases to fetch relevant content |
That’s, why RAG is application focused and usable for most Companies .
How RAG Works ?
There are steps in its usage process: but mainly three parts: INGESTION | RETRIEVAL | GENERATION
Inputs : External Data Sources - Pdf, Image, Text, Database, Videos, Books, Web etc. + User Prompt
INGESTION / INDEXING:
First we load our data, then
Splitting : CHUNKING
First we do Chunk data down , this process is called Chunking, why we do that ?
To break down Bigger data, as all LLMs have their own Context Window - How much context they can have at a particular moment - similiar to our Memory Capacity, like how much we can remember at a point about anything ( like a topic - it depends on individuals) - similarily, LLMs have their own Context Window Limits, so it cannot used our Terabytes of data as context , and this context changes with user Prompt too, so we chunk down data to make it easier for Models to Get only specific data according to user Prompt .
This Chunking is the most important step - as here u have to decide carefully and smartly how to chunk down your data, as if u provide bigger chunks that might lead to Hallucinations to Model, if u provide smaller chunks that may result in Data loss.
Embedding :
here we create embeddings of those chunks , to store them with some metadata, so that their retrieval becomes feasible at time of data fetching.
now these embeddings can be as simple as Texts, and can be as complex as Vector Embeddings leading to Graphs which again lead to some Vector Embeddings
Vector Embeddings are most common, as below that normal text, means your data is not that big , that this is not Optimal technique to use, Vector Embeddings - Giving Semantic Meaning in a 3d Space
Storing : Vector Database
here we store those embedding’s for using later on , for that we have multiple databases available like Pinecone, Qdrant , Chroma etc.
Till here , we are done with INGESTION/ INDEXING part, basically, ingesting the external data, and we are ready for user Prompt now
RETRIEVING :
On Receiving user prompt, We again create Vector embeddings of it, and sent in Vector Database (where we have Pre-Organized useful Data) to search for Relevant data there, we take out Relevant data from database
then it depends, if in metaData of those Vector Embeddings , if we had, directly data available then we are ready with Relevant Context to pass to our MODEL , but if not then we need to search through that metaData in our Original Data Source to fetch Actual Relevant Data to be used as Context now
GENERATION:
- After, fetching useful Context, we Provide it to our LLM along with User Query, and now LLM have its own PreTrained Data Knowledge + Useful Knowledge Provided from External Data and User Query - It just has to create his Response, that’s how whole RAG works.
🚀 Useful Tools for RAG:
1**. LangChain**
What it does: Framework that connects LLMs with external data (documents, databases, APIs) easily.
Why it's useful: Simplifies building RAG pipelines — from retrieval to generation — using modular components like retrievers, chains, and agents.
2. LlamaIndex (formerly GPT Index)
What it does: Helps structure your private or custom data into indices that LLMs can query easily.
Why it's useful: Makes it super easy to connect LLMs to your personal or enterprise datasets for custom retrieval in RAG workflows.
3. FAISS (Facebook AI Similarity Search)
What it does: A library for fast, efficient similarity search across massive document embeddings.
Why it's useful: Critical for quickly finding the most relevant documents when you're doing retrieval at scale.
4. Pinecone
What it does: Fully managed vector database service built for production-grade similarity search.
Why it's useful: Saves time on infrastructure; scalable, real-time document retrieval with very low latency.
5. Chroma
What it does: Open-source embedding database designed for AI applications.
Why it's useful: Easy to use locally during prototyping or small-scale RAG projects without needing cloud services.
6. Haystack (by deepset)
What it does: Framework for building end-to-end search and question-answering systems over your data.
Why it's useful: Focused on retrieval-augmented pipelines, offers fine-tuning, and works well for complex RAG tasks like document QA or multi-hop retrieval.
7. OpenAI Embeddings API
What it does: Generates vector representations of text for semantic similarity searches.
Why it's useful: Essential for converting your documents into embeddings that you can later retrieve relevant pieces from in RAG.
8. Qdrant
What it does: Open-source, high-performance vector database designed for storing and searching embeddings.
Why it's useful: Provides efficient and scalable semantic search with filtering support, making it ideal for complex RAG systems where you might want fine-grained retrieval (for example, filter by metadata like date, category, tags while searching).
Summary:
What RAG Solves :
How Rag Works ?
Conclusion:
Just Explained what i understood about RAG! if u find it useful then don’t forget to like this Article &
Follow for more such informative Articles Me
Credits:
Credits: I am very grateful to ChaiCode for Providing all this knowledge, Insights , Deep Learning about AI : Piyush Garg Hitesh Choudhary
If you want to learn too, you can Join here → Cohort || Apply from ChaiCode & Use NAKUL51937 to get 10% off
Thanks:
Feel free to Comment your thoughts, i love hearing feedback and improve, this is my third Article :)
Thanks you for Giving your Precious time , reading this article
Connect:
Let’s learn something Together: LinkedIn
If you would like , you can Check out my Portfolio
Subscribe to my newsletter
Read articles from Nakul Srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Nakul Srivastava
Nakul Srivastava
I'm a passionate web developer who loves building beautiful, functional, and efficient web applications. I focus on crafting seamless user experiences using modern frontend technologies.