RAG (Retrieval-Augmented Generation): A Simple Guide to Smarter AI

🤖 What in the World is RAG?

Imagine trying to answer a question in an exam... but your textbook is locked in your locker. 😬 That's traditional AI.

Now imagine the AI breaks into the locker, pulls out the book, finds the exact page, and quotes it to you mid-answer. That's RAG — ninja-level response generation with backup.

In plain English: It’s like if ChatGPT had a superpower to Google stuff while talking to you. Cool, right?

🪟 Context Window: Not a Real Window, But Close

🔍 What's a Context Window?

It’s how much info the AI can read at once — like its attention span. Some models are goldfish (tiny windows), others are elephants (big windows). But all have limits.

💡 Think of it like this: trying to read 1000 rows of Excel in one glance? Nah. But just the 40 rows you need? Much better.

🧠 Why It Matters

Dumping your entire life story into the prompt might overwhelm the AI. It's like asking someone what time it is after handing them the entire history of clocks.

🧃 Types of RAG: API vs File Feeds

🔌 API-Based RAG

This is like calling your friend Google every time you forget something. The AI makes real-time calls to get fresh data. Great for fast-changing stuff (news, stock prices, memes).

📂 File-Based RAG

This one’s the quiet nerdy type. It reads documents (PDFs, text files), stores smart versions of them, and pulls them up when needed. Perfect for stuff that doesn’t change often — like manuals, laws.

🧩 How RAG Works: The 3-Stage Magic Show

🎩 Spoiler: No rabbits, just embeddings.

Part 1: Indexing (The Prep Work)

You’ve got a stack of documents. What now?

Chunk them up – Break documents into small readable bites (not actual snacks).
Embed them – Give each chunk a smart label (like a DNA tag).
Store them – Save in a vector database (think smart filing cabinet).

📦 Data → 🍰 Chunks → 🧠 Embeddings → 🗃️ Vector DB

Part 2: Retrieval (The Scavenger Hunt)

Someone asks, "How do I make RAG work?"

AI makes an embedding of the question.
It digs through the vector DB.
Finds the juiciest, most relevant chunks.

🧭 Query → 🎯 Search → 🔍 Top Chunks

Part 3: Ask the Model (The Grand Finale)

All that chunked, embedded, and retrieved goodness goes into the prompt.

System prompt is filled with relevant info.
User’s question is added.
LLM works its magic like a caffeinated librarian.

✨Context + Query → 🤖 LLM → 💬 Answer

🧰 LangChain: The Cool Assistant Who Knows Everyone

Think of LangChain as the social butterfly of the AI world. It's not the brain, but it knows how to connect the brain to books, PDFs, APIs, and even that dusty archive folder you forgot about.

🚀 Let’s Build One: Practical RAG with LangChain

Here’s how to go from "I have a PDF" to "My AI gives smart answers":

pip install langchain qdrant-client pypdf tiktoken openai

Load PDF – Open sesame!

# 1. Load PDF – Open sesame!
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader

pdf_path = Path(__file__).parent / "example.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

Split Text – Break it down into bite-sized pieces.

# 2. Split Text – Break it down into bite-sized pieces.
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)

Create Embeddings – Turn words into vector soup.

# 3. Create Embeddings – Turn words into vector soup.
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()  # Assumes OPENAI_API_KEY is set in env

Run Qdrant – Fire up your vector database in docker.

# 4. Run Qdrant – Fire up your vector database.
services:
  qdrant: 
    image: qdrant/qdrant
    ports:
      - 6333:6333

Ingest Data – Feed the chunks into the DB.

## 5. Ingest Data – Feed the chunks into the DB.
from langchain_qdrant import QdrantVectorStore

vector_store = QdrantVectorStore.from_documents(
     documents=split_docs,
     url="http://localhost:6333",
     collection_name="example",
     embedding=embeddings
 )

Retrieve Chunks – Find what matters.

## 6. Retrieve Chunks – Find what matters.
retriever = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="chaicode-courses",
    embedding=embeddings
)

Ask the Model – Send relevant data to LLM and get smart replies.

## 7. Ask the Model – Send relevant data to LLM and get smart replies.
from langchain_openai import OpenAIEmbeddings, ChatOpenAI

chat_model = ChatOpenAI(
    model_name="gpt-4o",
    temperature=0,
    openai_api_key=""
)

 print("Ask a question (type 'exit' to quit):")
    while True:
        query = input(">>> ")
        if query.lower() in ['exit', 'quit']:
            break

        # Retrieve top 4 most relevant chunks
        results = retriever.similarity_search(query=query, k=4)
        context = "\n\n".join([doc.page_content for doc in results])

        # Create system prompt
        system_prompt = f"""You are an expert assistant of Example Pdf. Answer the user's question using the following context extracted from a PDF text. Be detailed, accurate, and relevant.

        Rules : 
        - Don't Answer anything else other than the information provided in context.
        - If any information which is not present in the context asked by user than reply in funny and kind way to user that you dont have this knowledge.


        Context:
        {context}
"""

        response = chat_model.invoke([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ])

        print("\nAnswer:\n")
        print(response.content)

if __name__ == "__main__":
    main()

🎉 Boom! You just built a mini ChatGPT with memory.

🎉 Final Thoughts

RAG isn’t magic. It’s just very, very smart recycling of your own info. It makes your AI model more like Sherlock Holmes — sharp, informed, and kinda charming.

And if you’re ever confused? Just remember:

👉 Chunk it.
👉 Embed it.
👉 Retrieve it.
👉 Answer like a boss.

Thank you for reading our article! We appreciate your support and encourage you to follow us for more engaging content. Stay tuned for exciting updates and valuable insights in the future. Don't miss out on our upcoming articles—stay connected and be part of our community!

YouTube : youtube.com/@mycodingjourney2245

LinkedIn : linkedin.com/in/nidhi-jagga-149b24278

GitHub : github.com/nidhijagga

HashNode : https://mycodingjourney.hashnode.dev/

A big shoutout to Piyush Garg Hitesh Choudhary for kickstarting the GenAI Cohort and breaking down the world of Generative AI in such a simple, relatable, and impactful way! 🚀
Your efforts are truly appreciated — learning GenAI has never felt this fun and accessible. 🙌

#ChaiCode #ChaiAndCode #GenAI #ChaiAndCode #GenAI #LangChainMagic #QdrantPower #RAGified #AIProjects #PromptEngineering #LLMDev #VectorDBVibes #PythonWithChai #AIthatRetrieves #OpenAIandChill #ChunkItEmbedItSlayIt #EmbeddingsFTW #ContextMatters #RetrievalGang #SmartChatbots #CodeWithChai

🔍 RAG (Retrieval-Augmented Generation): A Simple Guide to Smarter AI

Table of contents