๐ง Understanding RAG (Retrieval-Augmented Generation) for Smarter LLMs

๐ค Why do we even need RAG?
Large Language Models (LLMs) like GPT, Gemini, Claude, etc. are amazing โ but they come with limitations:
They are trained on general data (mostly internet-based).
They donโt know your business-specific or custom data (e.g., internal docs, product DB, PDFs).
So if you ask them:
โWhat's the refund policy in my internal employee handbook?โ
LLM will shrug (metaphorically ๐คท) because it has never seen your PDF.
That's where RAG comes in. It augments LLMs with retrieved real-world data.
๐ญ RAG Simplified
RAG = Retrieval + Generation
Or simply put:
๐ "RAG is LLM ko haath paair dena" (RAG is helping the LLM with external context so it can actually give you meaningful answers.)
For example, if you want to chat with a specific PDF, the LLM alone canโt do it. But RAG makes it possible ๐ก
๐ ๏ธ Simple Example of RAG
Letโs say we have this scenario:
You have 10 rows in a product database.
You can put those 10 rows along with the user's question into the prompt and ask the LLM to generate a smart response using those rows.
prompt = f"""
Here are the 10 rows of product info:
{db_rows}
User question: {user_query}
"""
But... โ ๏ธ
Thereโs a problem โ prompt/token size limit. What if you had 10,000 rows? You canโt fit that into the prompt. That's when real RAG magic starts ๐ฎ
๐ Overcoming the Token Size Limit
Instead of stuffing everything in the prompt, RAG works smartly:
โ
It converts data into semantic vectors
โ
Then finds the most relevant data chunks based on user query
โ
And gives only that to the LLM
Letโs break it down with a PDF example.
๐ Example: Chat with PDF โ Two Approaches
๐ด Approach 1: Dump entire PDF into prompt
PDF โ TEXT
USER_PROMPT + SYSTEM_PROMPT(TEXT) โ LLM
Issue: If the PDF is large, the token limit will explode ๐ฅ
Not scalable, not efficient.
๐ข Approach 2: Chunk PDF + Semantic Search
PDF โ Chunk1, Chunk2, Chunk3, ...
Relevant Chunk = Chunk2 โ (semantic search)
USER_PROMPT + SYSTEM_PROMPT(Chunk2) โ LLM
Now, you only send the relevant chunk to the model!
Less tokens, more accurate answers. Win-win ๐
๐ง Semantic Chunk Retrieval + Manual Prompt (Real RAG)
Want to see real RAG in action? Here's the core idea:
Weโll use LangChain to:
๐งพ Load and split the PDF
๐ Store chunks in Qdrant
๐งฒ Retrieve relevant chunks based on user query
๐ง Pass only the relevant context to Gemini LLM
๐ก Full Code Example:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Qdrant
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from qdrant_client import QdrantClient
# === 1. Load and Chunk PDF ===
loader = PyPDFLoader("sample.pdf")
pages = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(pages)
# === 2. Generate Embeddings ===
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
# === 3. Store in Qdrant ===
vectorstore = Qdrant.from_documents(
documents=chunks,
embedding=embedding_model,
url="http://localhost:6333",
collection_name="rag_chunks"
)
# === 4. Retrieve Relevant Chunks ===
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
query = "What is the refund policy?"
relevant_docs = retriever.get_relevant_documents(query)
# Merge the top 3 chunks into one context
semantic_context = "\n".join([doc.page_content for doc in relevant_docs])
# === 5. Feed to Gemini Manually ===
llm = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.2)
final_prompt = f"""
You are a helpful assistant. Use the context below to answer the user's question.
Context:
{semantic_context}
Question:
{query}
"""
response = llm.invoke(final_prompt)
print("๐ง Answer:", response.content)
๐งช Output:
๐ง Answer: The refund policy allows returns within 30 days...
With this approach, you're doing full RAG manually โ perfect for learning and building production apps.
๐ป Full-Stack ChatPDF Project:
Iโve built a full-stack ChatPDF project using:
LangChain
Gemini API
Semantic Search
File Upload Support
Check it out on GitHub ๐
๐ https://github.com/r00tshaim/chat-pdf
Let's connect on LinkedIn!
๐ https://www.linkedin.com/in/shaimkhanusiya/
Subscribe to my newsletter
Read articles from Shaim Khanusiya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
