🧠 Understanding RAG (Retrieval-Augmented Generation) for Smarter LLMs

🤔 Why do we even need RAG?
Large Language Models (LLMs) like GPT, Gemini, Claude, etc. are amazing — but they come with limitations:
They are trained on general data (mostly internet-based).
They don’t know your business-specific or custom data (e.g., internal docs, product DB, PDFs).
So if you ask them:
“What's the refund policy in my internal employee handbook?”
LLM will shrug (metaphorically 🤷) because it has never seen your PDF.
That's where RAG comes in. It augments LLMs with retrieved real-world data.
🍭 RAG Simplified
RAG = Retrieval + Generation
Or simply put:
👉 "RAG is LLM ko haath paair dena" (RAG is helping the LLM with external context so it can actually give you meaningful answers.)
For example, if you want to chat with a specific PDF, the LLM alone can’t do it. But RAG makes it possible 💡
🛠️ Simple Example of RAG
Let’s say we have this scenario:
You have 10 rows in a product database.
You can put those 10 rows along with the user's question into the prompt and ask the LLM to generate a smart response using those rows.
prompt = f"""
Here are the 10 rows of product info:
{db_rows}
User question: {user_query}
"""
But... ⚠️
There’s a problem — prompt/token size limit. What if you had 10,000 rows? You can’t fit that into the prompt. That's when real RAG magic starts 🔮
🚀 Overcoming the Token Size Limit
Instead of stuffing everything in the prompt, RAG works smartly:
✅ It converts data into semantic vectors
✅ Then finds the most relevant data chunks based on user query
✅ And gives only that to the LLM
Let’s break it down with a PDF example.
📝 Example: Chat with PDF — Two Approaches
🔴 Approach 1: Dump entire PDF into prompt
PDF → TEXT
USER_PROMPT + SYSTEM_PROMPT(TEXT) → LLM
Issue: If the PDF is large, the token limit will explode 💥
Not scalable, not efficient.
🟢 Approach 2: Chunk PDF + Semantic Search
PDF → Chunk1, Chunk2, Chunk3, ...
Relevant Chunk = Chunk2 ← (semantic search)
USER_PROMPT + SYSTEM_PROMPT(Chunk2) → LLM
Now, you only send the relevant chunk to the model!
Less tokens, more accurate answers. Win-win 🏆
🧠 Semantic Chunk Retrieval + Manual Prompt (Real RAG)
Want to see real RAG in action? Here's the core idea:
We’ll use LangChain to:
🧾 Load and split the PDF
🔍 Store chunks in Qdrant
🧲 Retrieve relevant chunks based on user query
🧠 Pass only the relevant context to Gemini LLM
💡 Full Code Example:
pythonCopyEditimport os
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Qdrant
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from qdrant_client import QdrantClient
# === 1. Load and Chunk PDF ===
loader = PyPDFLoader("sample.pdf")
pages = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(pages)
# === 2. Generate Embeddings ===
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
# === 3. Store in Qdrant ===
vectorstore = Qdrant.from_documents(
documents=chunks,
embedding=embedding_model,
url="http://localhost:6333",
collection_name="rag_chunks"
)
# === 4. Retrieve Relevant Chunks ===
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
query = "What is the refund policy?"
relevant_docs = retriever.get_relevant_documents(query)
# Merge the top 3 chunks into one context
semantic_context = "\n".join([doc.page_content for doc in relevant_docs])
# === 5. Feed to Gemini Manually ===
llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.2)
final_prompt = f"""
You are a helpful assistant. Use the context below to answer the user's question.
Context:
{semantic_context}
Question:
{query}
"""
response = llm.invoke(final_prompt)
print("🧠 Answer:", response.content)
🧪 Output:
sqlCopyEdit🧠 Answer: The refund policy allows returns within 30 days...
With this approach, you're doing full RAG manually — perfect for learning and building production apps.
💻 Full-Stack ChatPDF Project:
I’ve built a full-stack ChatPDF project using:
LangChain
Gemini API
Semantic Search
File Upload Support
Check it out on GitHub 👇
🔗 https://github.com/r00tshaim/chat-pdf
Let's connect on LinkedIn!
🔗 https://www.linkedin.com/in/shaimkhanusiya/
💬 Wrapping Up
RAG is not just a buzzword — it's the practical solution for making LLMs work with your data.
So next time someone asks,
“Can GPT read my company handbook?”
You say:
“Yes, but only if we give it haath-pair via RAG 💪”
Would you like me to convert this into a downloadable .md
file too for quick upload?
Subscribe to my newsletter
Read articles from Shaim Khanusiya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
