Talk to Your PDF with RAG

Have you ever wished you could just ask questions to a PDF instead of reading the whole thing?
That’s exactly what RAG (Retrieval-Augmented Generation) lets you do!

In this blog, we’ll first understand RAG in simple terms, and then I’ll show you how to actually build it using LangChain, OpenAI/Gemini, and Qdrant — all step-by-step!

What is RAG (Retrieval-Augmented Generation)?

RAG is a smart way of making AI models even more useful:
Instead of expecting the AI to "know everything," RAG allows it to retrieve external information before generating a response.

Think of it like giving your AI access to a private library.
When you ask a question, it quickly finds the right book, reads the right page, and gives you an answer — instead of just guessing.

In short: Jo bhi relevant data hai, usko prompt me daal do

RAG Explained with a Simple Example: Talking to a PDF

Imagine you have a PDF file — like your lecture notes — and instead of reading through hundreds of pages, you simply want to ask it questions like:

"What is supervised learning?"
"Explain the bias-variance tradeoff."

Wouldn’t it be great if the system could understand your question, find the right information from the PDF, and give you a clean answer?

That’s exactly what RAG helps you do.
Here’s a simple breakdown of how it works behind the scenes:

Step 1: Preparing the PDF

Before you start asking questions, you need to prepare the document:

Step	Action
1	Indexing: Read and extract text from the PDF
2	Chunking: Split the text into smaller pieces
3	Embedding: Turn each chunk into a numerical vector
4	Storing: Save the vectors into a vector database

Step 2: Answering Questions (Query Phase)

When you ask something:

Step	Action
1	User Query: You type a question
2	Embedding: Convert your query into a vector
3	Searching: Find similar chunks from the database
4	Retrieving: Pull the most relevant text chunks
5	Generating: Pass them to a Large Language Model (LLM) to generate a final answer

Now Let's Build It: Talk to Your PDF Using LangChain

Before we dive into building, let’s quickly understand what LangChain is and why we are using it.

What is LangChain?

LangChain is an open-source framework designed to help developers easily build applications that combine Large Language Models (LLMs) with external data sources like documents, APIs, databases, and more.

Think of LangChain as the connective tissue —
it helps you load data, process it, retrieve relevant pieces, and then interact intelligently with an LLM like GPT-4, Gemini, or others.

In simple words:
LangChain gives you the tools to build smart, retrieval-based AI systems without reinventing the wheel.

Why are we using LangChain here?

It makes loading PDFs, splitting text, generating embeddings, storing vectors, and retrieving information very easy.
It provides ready-to-use integrations with models like OpenAI, Gemini, and vector databases like Qdrant, FAISS, etc.
It lets you quickly set up retrieval-augmented generation (RAG) workflows with minimal code.

Step 1: Load the PDF

First, we need to load the PDF file.

This will extract all the text from the PDF.

Step 2: Split the Text into Chunks

We don’t want to work with huge blocks of text.
Let's split them into manageable chunks.

Step 3: Create Embeddings

Now, we turn each chunk into an embedding — a special numeric representation.

You can choose between OpenAI or Gemini embeddings:

Step 4: Store Embeddings in a Vector Database (Qdrant)

Let's store all those embeddings into Qdrant, a powerful vector database.

Step 5: Create a Retriever

Instead of manually searching, we create a Retriever from the vector database.
The Retriever will automatically find the most relevant chunks for any user query.

Step 6: Set up the LLM to answer like a helpful assistant, using a custom system prompt.

We now connect the retriever to a language model.
Here, we simulate using an OpenAI-compatible model (e.g., Gemini or any OpenAI-style endpoint) with an API key.

Additionally, we set a system prompt to instruct the AI to always behave in a helpful, concise, and user-friendly manner.

Continuing,

Make sure you have installed all these packages before running the code!

Path — to work with file paths easily.
PyPDFLoader — to load and read PDF files.
RecursiveCharacterTextSplitter — to break the text into smaller chunks.
OpenAIEmbeddings — to create embeddings using OpenAI models.
QdrantVectorStore — to store and search embeddings in Qdrant.
QdrantClient — to connect to the Qdrant server.
GoogleGenerativeAIEmbeddings — to create embeddings using Google's Gemini models.
OpenAI — to interact with OpenAI models (for answering questions).
os and json — to handle environment variables and data formatting.

Visual Flow of the System

[PDF] → [Load] → [Chunk] → [Embed] → [Store in Qdrant]

↓

[User Query] → [Embed] → [Search Qdrant] → [Retrieve Relevant Chunks] → [LLM] → [Answer]

Final Words

With RAG and LangChain, you’re not just building chatbots —
You’re creating intelligent systems that can read, search, and answer from any document in real time!

Whether it's a 10-page college assignment or a 1000-page company policy book — you can now just ask and get accurate answers.

Welcome to the future of document interaction!

Full Code (for Reference if You Get Stuck):


from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from openai import OpenAI
import os
import json

pdf_path = Path(__file__).parent / "BTP_Report.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)

split_docs = text_splitter.split_documents(documents=docs)

embedder = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=""  # Replace with your actual Gemini API key
)

vector_store = QdrantVectorStore.from_documents(
    documents=[],
    url="http://localhost:6333",
    collection_name="learning_langchain2",
    embedding=embedder
)

vector_store.add_documents(documents = split_docs)


retriver = QdrantVectorStore.from_existing_collection(
    url = "http://localhost:6333",
    collection_name="learning_langchain2",
    embedding=embedder
)
api_key = os.getenv("GEMINI_API_KEY")

client = OpenAI(
    api_key=api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

while True:
    user_query = input('> ')
    relevant_chunks = retriver.similarity_search(
    query = user_query
    )

    SYSTEM_PROMPT = f"""
You are an helpful AI Assistant who responds base of the avaiable context.

Context: {relevant_chunks}
 You work on start, plan, action, observe mode.
    For the given user query and available tools, plan the step by step execution, based on the planning,
    select the relevant context.
    Wait for the observation and based on the observation from the context resolve the user query.

    Rules:
    - Follow the Output JSON Format.
    - Always perform one step at a time and wait for next intput 
    - Carefully analyse the user query

    Output JSON Format:
    {{
        "step": "string",
        "content": "string",
        "input": "The input query of the user",
    }}

    Example:
    User Query:  What is the content of this pdf?
    Output: {{ "step": "plan", "content": "The user is interested in the content of this pdf" }}
    Output: {{ "step": "plan", "content": "From the available context I should read Context: {relevant_chunks}" }}
    Output: {{ "step": "output", "content": "The pdf describes about Artificial Intelligence" }}

"""
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user', 'content': user_query},
    ]

    messages.append({ 'role': 'user', 'content': user_query })

    while True:
        response = client.chat.completions.create(
            model='gemini-2.0-flash',
            response_format={"type": "json_object"},
            messages=messages,
        )

        parsed_output = json.loads(response.choices[0].message.content)
        messages.append({ 'role': 'assistant', 'content': json.dumps(parsed_output) })

        if parsed_output['step'] == 'plan':
            print(f"🧠: {parsed_output.get('content')}")
            continue

        if parsed_output['step'] == 'output':
            print(f"🤖: {parsed_output.get('content')}")
            break

Interact with Your PDF: A RAG Approach