π RAG (Retrieval-Augmented Generation): A Simple Guide to Smarter AI


π€ What in the World is RAG?
Imagine trying to answer a question in an exam... but your textbook is locked in your locker. π¬ That's traditional AI.
Now imagine the AI breaks into the locker, pulls out the book, finds the exact page, and quotes it to you mid-answer. That's RAG β ninja-level response generation with backup.
In plain English: Itβs like if ChatGPT had a superpower to Google stuff while talking to you. Cool, right?
πͺ Context Window: Not a Real Window, But Close
π What's a Context Window?
Itβs how much info the AI can read at once β like its attention span. Some models are goldfish (tiny windows), others are elephants (big windows). But all have limits.
π‘ Think of it like this: trying to read 1000 rows of Excel in one glance? Nah. But just the 40 rows you need? Much better.
π§ Why It Matters
Dumping your entire life story into the prompt might overwhelm the AI. It's like asking someone what time it is after handing them the entire history of clocks.
π§ Types of RAG: API vs File Feeds
π API-Based RAG
This is like calling your friend Google every time you forget something. The AI makes real-time calls to get fresh data. Great for fast-changing stuff (news, stock prices, memes).
π File-Based RAG
This oneβs the quiet nerdy type. It reads documents (PDFs, text files), stores smart versions of them, and pulls them up when needed. Perfect for stuff that doesnβt change often β like manuals, laws.
π§© How RAG Works: The 3-Stage Magic Show
π© Spoiler: No rabbits, just embeddings.
Part 1: Indexing (The Prep Work)
Youβve got a stack of documents. What now?
Chunk them up β Break documents into small readable bites (not actual snacks).
Embed them β Give each chunk a smart label (like a DNA tag).
Store them β Save in a vector database (think smart filing cabinet).
π¦ Data β π° Chunks β π§ Embeddings β ποΈ Vector DB
Part 2: Retrieval (The Scavenger Hunt)
Someone asks, "How do I make RAG work?"
AI makes an embedding of the question.
It digs through the vector DB.
Finds the juiciest, most relevant chunks.
π§ Query β π― Search β π Top Chunks
Part 3: Ask the Model (The Grand Finale)
All that chunked, embedded, and retrieved goodness goes into the prompt.
System prompt is filled with relevant info.
Userβs question is added.
LLM works its magic like a caffeinated librarian.
β¨Context + Query β π€ LLM β π¬ Answer
π§° LangChain: The Cool Assistant Who Knows Everyone
Think of LangChain as the social butterfly of the AI world. It's not the brain, but it knows how to connect the brain to books, PDFs, APIs, and even that dusty archive folder you forgot about.
π Letβs Build One: Practical RAG with LangChain
Hereβs how to go from "I have a PDF" to "My AI gives smart answers":
pip install langchain qdrant-client pypdf tiktoken openai
- Load PDF β Open sesame!
# 1. Load PDF β Open sesame!
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
pdf_path = Path(__file__).parent / "example.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()
- Split Text β Break it down into bite-sized pieces.
# 2. Split Text β Break it down into bite-sized pieces.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)
- Create Embeddings β Turn words into vector soup.
# 3. Create Embeddings β Turn words into vector soup.
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings() # Assumes OPENAI_API_KEY is set in env
- Run Qdrant β Fire up your vector database in docker.
# 4. Run Qdrant β Fire up your vector database.
services:
qdrant:
image: qdrant/qdrant
ports:
- 6333:6333
- Ingest Data β Feed the chunks into the DB.
## 5. Ingest Data β Feed the chunks into the DB.
from langchain_qdrant import QdrantVectorStore
vector_store = QdrantVectorStore.from_documents(
documents=split_docs,
url="http://localhost:6333",
collection_name="example",
embedding=embeddings
)
- Retrieve Chunks β Find what matters.
## 6. Retrieve Chunks β Find what matters.
retriever = QdrantVectorStore.from_existing_collection(
url="http://localhost:6333",
collection_name="chaicode-courses",
embedding=embeddings
)
- Ask the Model β Send relevant data to LLM and get smart replies.
## 7. Ask the Model β Send relevant data to LLM and get smart replies.
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
chat_model = ChatOpenAI(
model_name="gpt-4o",
temperature=0,
openai_api_key=""
)
print("Ask a question (type 'exit' to quit):")
while True:
query = input(">>> ")
if query.lower() in ['exit', 'quit']:
break
# Retrieve top 4 most relevant chunks
results = retriever.similarity_search(query=query, k=4)
context = "\n\n".join([doc.page_content for doc in results])
# Create system prompt
system_prompt = f"""You are an expert assistant of Example Pdf. Answer the user's question using the following context extracted from a PDF text. Be detailed, accurate, and relevant.
Rules :
- Don't Answer anything else other than the information provided in context.
- If any information which is not present in the context asked by user than reply in funny and kind way to user that you dont have this knowledge.
Context:
{context}
"""
response = chat_model.invoke([
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
])
print("\nAnswer:\n")
print(response.content)
if __name__ == "__main__":
main()
- π Boom! You just built a mini ChatGPT with memory.
π Final Thoughts
RAG isnβt magic. Itβs just very, very smart recycling of your own info. It makes your AI model more like Sherlock Holmes β sharp, informed, and kinda charming.
And if youβre ever confused? Just remember:
π Chunk it.
π Embed it.
π Retrieve it.
π Answer like a boss.
Thank you for reading our article! We appreciate your support and encourage you to follow us for more engaging content. Stay tuned for exciting updates and valuable insights in the future. Don't miss out on our upcoming articlesβstay connected and be part of our community!
YouTube : youtube.com/@mycodingjourney2245
LinkedIn : linkedin.com/in/nidhi-jagga-149b24278
GitHub : github.com/nidhijagga
HashNode : https://mycodingjourney.hashnode.dev/
A big shoutout to Piyush Garg Hitesh Choudhary for kickstarting the GenAI Cohort and breaking down the world of Generative AI in such a simple, relatable, and impactful way! π
Your efforts are truly appreciated β learning GenAI has never felt this fun and accessible. π
#ChaiCode #ChaiAndCode #GenAI #ChaiAndCode #GenAI #LangChainMagic #QdrantPower #RAGified #AIProjects #PromptEngineering #LLMDev #VectorDBVibes #PythonWithChai #AIthatRetrieves #OpenAIandChill #ChunkItEmbedItSlayIt #EmbeddingsFTW #ContextMatters #RetrievalGang #SmartChatbots #CodeWithChai
Subscribe to my newsletter
Read articles from Nidhi Jagga directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
