Hey there, welcome to my blog. If you are here, then you are also curious as to how can we build Retrieval-Augmented Generation (RAG) Applications. It is easy as well as complex. Let’s dive deep into it.

So, we would be using Gemini API, LangChain, Python and QDrant (Vector Database) to build a simple PDF Chatbot. User can talk to the chatbot regarding questions and doubts from the PDF and the RAG Agent will provide the answer to user queries.

TL;DR: This is a simple terminal based application, but we can integrate it in backend and frontend as well.

Figure: Chain of thought of building a simple PDF chatbot. One of the most basic RAG applications which we can built.

Step 1: Setup the virtual environment:

python -m venv .venv
source .venv/Scripts/activate

this should activate the virtual environment. We need to keep our packages in a virtual environment (Good practice of Software Engineering).

Step 2: Install necessary packages:

. python-dotenv (to load env files - API keys and URL)
. langchain_community (to access pdf loader wrappers)
. langchain_text_splitters (to split the texts into chunks)
. langchain_google_genai (we are using Gemini API and google's gemini model's)
. langchain_qdrant (for Qdrant Vector DB - Open Source Vector DB)

Step 3: Load pdf file as data source:

Here, I am loading a pdf file of nodejs handbook

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader

pdf_file_path = Path(__file__).parent / "node_handbook.pdf"
loader = PyPDFLoader(pdf_file_path)
docs = loader.load()

Step 4: Split the document (docs) into chunks:

We will use the method of splitting based on Characters (recursive character split). The chunk size will be of 1000 and there can be overlapping of chunks as well. Then we will feed the document ‘docs’ and the document will be split into 1000 chunks with an overlapping of 200 chunks.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents=docs)

Step 5: Create embeddings and store in a vector db:

We need to create vector embeddings. Here, we will use Google’s model ‘text-embedding-004‘

We will also use Qdrant db to store the vector embeddings which will be used to generate results for the question/query posed by the user.

import os
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_qdrant import QdrantVectorStore

load_dotenv() # to load the API Key and QDrant db URL

embedder = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")
vector_store = QdrantVectorStore.from_documents(
    documents=[],
    collection_name="pdf_content_collection",
    embedding=embedder,
    url=os.getenv("QDRANT_URL"),         # ✅ Let langchain-qdrant create the client
    api_key=os.getenv("QDRANT_API_KEY")  # ✅ Same here
)
vector_store.add_documents(split_docs)

Step 6: Retrieve semantic meanings from existing collection from vector db:

Now that our vector embeddings are created and stored in the collection named “pdf_content_collection”, we need to perform a similarity search based on user query.

retriver = QdrantVectorStore.from_existing_collection(
    collection_name="pdf_content_collection",
    embedding=embedder,
    url=os.getenv("QDRANT_URL"),         # ✅ Let langchain-qdrant create the client
    api_key=os.getenv("QDRANT_API_KEY")  # ✅ Same here
)

relevant_chunks = retriver.similarity_search(
    query="What is http module in JS?" # user query
)

Step 7: Chat with LLM to generate output result:

Now that we have our relevant chunks of information (more precise context), we just need to feed the same context along with the user query to get desired response.

We will be using Google Gemini model (gemini-2.0-flash-001) for invoking the LLM. We also need to provide a system prompt with the context injected.

SYSTEM_PROMPT = f"""
You are a helpful AI Assistant who responds based on the available context.

Context:
Author: {getattr(relevant_chunks[0].metadata, "author", "Unknown")}
Page Content:
{chr(10).join(chunk.page_content.replace("\\t", " ").replace("\t", " ") for chunk in relevant_chunks)}
"""

llmModel = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-001",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

messages = [
    ("system", SYSTEM_PROMPT),("human", "What is http module in JS?")   
]
llm_response = llmModel.invoke(messages)
print("Message Content: ", llm_response.content)

And there we are, we have our response ready below:

Message Content:  The http module in Node.js provides functions and classes to build an HTTP server. It is a key module for Node networking.

It can be included using:
```javascript
const http = require('http')
```

The module provides properties, methods, and classes, including:

**Properties:**

*   `http.METHODS`: Lists all the supported HTTP methods.
*   `http.STATUS_CODES`
*   `http.globalAgent`

**Methods:**

*   `http.createServer()`: Returns a new instance of the `http.Server` class.
*   `http.request()`: Makes an HTTP request to a server, creating an instance of the `http.ClientRequest` class.
*   `http.get()`: Similar to `http.request()`, but automatically sets the HTTP method to GET and calls `req.end()` automatically.

**Classes:**

*   `http.Agent`: Manages connection persistence and reuse for HTTP clients.
*   `http.ClientRequest`
*   `http.Server`
*   `http.ServerResponse`
*   `http.IncomingMessage`

That’s how you build an extremely simple RAG chatbot/agent.

Refer to the complete codebase: GitHub Link

If you liked my article, do share your comments below. Happy Coding :)

Building First RAG Application With Python, LangChain & Gemini