RAG Demystified: Create your own Simple RAG


Have you ever wanted to chat with your PDFs like theyβre your personal assistants? Whether you're skimming cheat sheets, research papers, or documentation β this blog shows you how to build an intelligent chatbot that answers questions from any PDF using Retrieval-Augmented Generation (RAG).
This is your complete guide to setting it up on Windows, with LangChain, Qdrant, and Google Gemini β all in under 100 lines of Python!
π What Youβll Build
A terminal-based chatbot where you can ask:
What is React?
Summarize the useEffect hook.
And get intelligent, accurate responses from the actual content of the PDF β thanks to RAG.
π What is RAG (Retrieval-Augmented Generation)?
RAG stands for:
Retrieval β Search the most relevant chunks from your PDF using a vector database.
Augmented β Inject those chunks into the prompt.
Generation β Let a language model generate context-aware responses based on those chunks.
This architecture helps LLMs like Gemini give accurate answers without hallucinating.
π§ The Two Main Phases
π Indexing Phase (One-Time)
Load PDF
Split it into manageable chunks
Embed chunks into vector format
Store them in Qdrant vector DB
π¬ Retrieval Phase (Each Query)
Accept a question
Search Qdrant for similar chunks
Combine them with your query
Let Gemini generate a response
π§± Project Structure
pdf-chat-rag/
βββ .venv/ # Python virtual environment
βββ docker-compose.yml # Qdrant Database config
βββ requirements.txt # Python dependencies
βββ main.py # Core app logic
βββ pdfs/
β βββ React CheatSheet.pdf # Add The PDF You want to chat to
βββ .env # Store Your Google API Key Here
π§ Step-by-Step Setup
β Step 1: Create Project & Activate Virtual Env
python -m venv .venv
.venv\\Scripts\\activate
β Step 2: Install Dependencies
Create requirements.txt
and add all the dependencies
langchain
langchain-community
langchain-google-genai
qdrant-client
python-dotenv
pypdf
Run the following command to install all dependencies
pip install -r requirements.txt
β Step 3: Add Your API Key
Create a .env
file:
GOOGLE_API_KEY=your_google_genai_api_key
β Step 4: Start Qdrant with Docker
Create docker-compose.yml
file and add Qdrant Config
services:
qdrant:
image: qdrant/qdrant
ports:
- "6333:6333"
Open a new Terminal & Start Qdrant:
docker-compose up
β Step 5: Add a PDF
Put your file in the pdfs/
folder:
pdfs/React CheatSheet.pdf
β
Step 6: Code the App β main.py
Import the dependencies
from pathlib import Path from dotenv import load_dotenv import os from langchain_community.document_loaders import PyPDFLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI from langchain_qdrant import QdrantVectorStore from langchain_core.prompts import PromptTemplate load_dotenv() GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
Load The PDF & Split it into chunks
pdf_path = Path("pdfs/React CheatSheet.pdf") loader = PyPDFLoader(file_path=pdf_path) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) split_docs = text_splitter.split_documents(docs)
Create Vector Embeddings
embedder = GoogleGenerativeAIEmbeddings( model="models/text-embedding-004", google_api_key=GOOGLE_API_KEY, )
Store It In Database
vector_store = QdrantVectorStore.from_documents( documents=split_docs, embedding=embedder, url="<http://localhost:6333>", collection_name="pdf_rag_chat" )
Set Up LLM
llm = ChatGoogleGenerativeAI( model="gemini-2.0-flash", google_api_key=GOOGLE_API_KEY )
Define the SYSTEM PROMPT
SYSTEM_PROMPT = """ You are a smart PDF assistant. Answer user queries using only the provided PDF excerpts. - For **summaries**, give a brief overview of key points. - For **specific questions**, extract and present relevant info directly. - For **explanations**, start with a simple overview, then add detail if needed. - If the info isn't in the excerpts, reply: "The PDF does not contain this information." Be clear, concise, and avoid unnecessary jargon. Structure your answers to match the user's intent. If query is unclear, ask user to clarify the question once again """ prompt_template = PromptTemplate( input_variables=["query", "excerpts"], template=SYSTEM_PROMPT + "\\n\\nUser Query: {query}\\n\\nRelevant PDF Excerpts:\\n{excerpts}\\n\\nAssistant:" )
Create an Interactive Chat-Loop
while True: query = input("Ask a question about the PDF (or type 'exit' to quit): ") if query.lower() in ["exit", "quit"]: print("Goodbye!") break docs = vector_store.similarity_search(query, k=3) context = "\\n\\n---\\n\\n".join([doc.page_content for doc in docs]) full_prompt = prompt_template.format(query=query, excerpts=context) response = llm.invoke(full_prompt) print("\\nAssistant:", response.content, "\\n")
Once Everthing is Setup then run the following command
python main.py
Once Done Try Asking Questions Related To Your PDF
Example Output:
πCongrats π
Youβve just built a full RAG pipeline that brings your static PDFs to life using cutting-edge AI:
β LangChain to glue everything
β Qdrant for fast semantic search
β Gemini for smart responses
β All running locally on your Windows machine
This setup works for cheat sheets, textbooks, resumes, reports etc.
π Letβs Connect
Enjoyed this tutorial? Share it, fork it, and follow for more on LangChain, LLMs, and full-stack AI.
Happy Building! π οΈ
Here Is Github Repository For the code: https://github.com/Vedu-07/Generative_AI
Subscribe to my newsletter
Read articles from Vedant Swami directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
