How to Create a PDF Chatbot Using RAG, Chunking, and Vector Search


Interacting with documents has evolved dramatically. Tools like Perplexity, ChatGPT, Claude, and NotebookLM have revolutionized how we engage with PDFs and technical content. Instead of tediously scrolling through pages, we can now receive instant summaries, answers, and explanations. But have you ever wondered what happens behind the scenes?
Let me guide you through creating your PDF chatbot using Python, LangChain, FAISS, and a local LLM like Mistral. This isn't about building a competitor to established solutions β it's a practical learning journey to understand fundamental concepts like chunking, embeddings, vector search, and Retrieval-Augmented Generation (RAG).
Understanding the Technical Foundation
Before diving into code, let's understand our technology stack. We'll use Python with Anaconda for environment management, LangChain as our framework, Ollama running Mistral as our local language model, FAISS as our vector database, and Streamlit for our user interface.
Harrison Chase launched LangChain in 2022. It simplifies application development with language models and provides the tools to process documents, create embeddings, and build conversational chains.
FAISS (Facebook AI Similarity Search) specializes in fast similarity searches across large volumes of text embeddings. We'll use it to store our PDF text sections and efficiently search for matching passages when users ask questions.
Ollama is a local LLM runtime server that allows us to run models like Mistral directly on our computer without a cloud connection. This gives us independence from API costs and internet requirements.
Streamlit enables us to quickly create a simple web application interface using Python, making our chatbot accessible and user-friendly.
Setting Up the Environment
Let's start by preparing our environment:
First, ensure Python is installed (at least version 3.7). We'll use Anaconda to create a dedicated environment
conda createβn pdf chatbot python=3.10
and activate it withconda activate pdf chatbot
.Create a project folder with
mkdir pdf-chatbot
and navigate to it usingcd pdf-chatbot
.Create a
requirements.txt
file in this directory with the following packages:
Install all required packages with
pip install -r requirements.txt
.Install Ollama from the official download page, then verify the installation by checking the version with
ollama --version
.In a separate terminal, activate your environment and run Ollama with the Mistral model using ollama run mistral.
Building the Chatbot: A Step-by-Step Guide
We aim to create an application that lets users ask questions about a PDF document in natural language and receive accurate answers based on the document's content rather than general knowledge. We'll combine a language model with intelligent document search to achieve this.
Structuring the Project
We'll create three separate files to maintain a clean separation between logic and interface:
chatbot_core.py - Contains the RAG pipeline logic
streamlit_app.py - Provides the web interface
chatbot_terminal.py - Offers a terminal interface for testing
The Core RAG Pipeline
Let's examine the heart of our chatbot in chatbot_core.py:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOllama
from langchain.chains import ConversationalRetrievalChain
def build_qa_chain(pdf_path="example.pdf"):
loader = PyPDFLoader(pdf_path)
documents = loader.load()[1:] # Skip page 1 (element 0)
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
docs = splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.from_documents(docs, embeddings)
retriever = db.as_retriever()
llm = ChatOllama(model="mistral")
qa_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
return_source_documents=True
)
return qa_chain
This function builds a complete RAG pipeline through several crucial steps:
Loading the PDF: We use PyPDFLoader to read the PDF into document objects that LangChain can process. We skip the first page since it contains only an image.
Chunking: We split the document into smaller sections of 500 characters with 100-character overlaps. This chunking is necessary because language models like Mistral can't process entire documents at once. The overlap preserves context between adjacent chunks.
Creating Embeddings: We convert each text chunk into a mathematical vector representation using HuggingFace's all-MiniLM-L6-v2 model. These embeddings capture the semantic meaning of the text, allowing us to find similar passages later.
Building the Vector Database: We store our embeddings in a FAISS vector database specializing in similarity searches. FAISS enables us to find text chunks that match a user's query quickly.
Creating a Retriever: The retriever acts as a bridge between user questions and our vector database. When someone asks a question, the system creates a vector representation of that question and searches the database for the most similar chunks.
Integrating the Language Model: We use the locally running Mistral model through Ollama to generate natural language responses based on the retrieved text chunks.
Building the Conversational Chain: Finally, we create a conversational retrieval chain that combines the language model with the retriever, enabling back-and-forth conversation while maintaining context.
This approach represents the essence of RAG: improving model outputs by enhancing the input with relevant information from an external knowledge source (in this case, our PDF).
Creating the User Interface
Next, let's look at our Streamlit interface in streamlit_app.py:
import streamlit as st
from chatbot_core import build_qa_chain
st.set_page_config(page_title="π PDF-Chatbot", layout="wide")
st.title("π Chat with your PDF")
qa_chain = build_qa_chain("example.pdf")
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
question = st.text_input("What would you like to know?", key="input")
if question:
result = qa_chain({
"question": question,
"chat_history": st.session_state.chat_history
})
st.session_state.chat_history.append((question, result["answer"]))
for i, (q, a) in enumerate(st.session_state.chat_history[::-1]):
st.markdown(f"**β Question {len(st.session_state.chat_history) - i}:** {q}")
st.markdown(f"**π€ Answer:** {a}")
This interface provides a simple way to interact with our chatbot. It sets up a Streamlit page, builds our QA chain using the specified PDF, initializes a chat history, creates an input field for questions, processes those questions through our QA chain, and displays the conversation history.
Terminal Interface for Testing
We also create a terminal interface in chatbot_terminal.py for testing purposes:
from chatbot_core import build_qa_chain
qa_chain = build_qa_chain("example.pdf")
chat_history = []
print("π§ PDF-Chatbot started! Enter 'exit' to quit.")
while True:
query = input("\nβ Your questions: ")
if query.lower() in ["exit", "quit"]:
print("π Chat finished.")
break
result = qa_chain({"question": query, "chat_history": chat_history})
print("\n㪠Answer:", result["answer"])
chat_history.append((query, result["answer"]))
print("\nπ Source β Document snippet:")
print(result["source_documents"][0].page_content[:300])
This version lets us interact with the chatbot through the terminal, showing answers and the source text chunks used to generate those answers. This transparency is valuable for learning and debugging.
Running the Application
To launch the Streamlit application, we run streamlit run streamlit_app.py in our terminal. The app opens automatically in a browser, where we can ask questions about our PDF document.
Future Improvements
While our current implementation works, several enhancements could make it more practical and user-friendly:
Performance Optimization: The current setup might take around two minutes to respond. We could improve this with a faster LLM or additional computing resources.
Public Accessibility: Our app runs locally, but we could deploy it on Streamlit Cloud to make it publicly accessible.
Dynamic PDF Upload: Instead of hardcoding a specific PDF, we could add an upload button to process any PDF the user chooses.
Enhanced User Interface: Our simple Streamlit app could benefit from better visual separation between questions and answers and from displaying PDF sources for answers.
The Power of Understanding
Building this PDF chatbot yourself provides deeper insight into the key technologies powering modern AI applications. You gain practical knowledge of how these systems function by working through each step, from chunking and embeddings to vector databases and conversational chains.
This approach's power lies in its combination of local LLMs and document-specific knowledge retrieval. By focusing the model only on relevant content from the PDF, we reduce the likelihood of hallucinations while providing accurate, contextual answers.
This project demonstrates how accessible these technologies have become. With open-source tools like Python, LangChain, Ollama, and FAISS, anyone with basic programming knowledge can build a functional RAG system that brings documents to life through conversation.
As you experiment with your implementation, you'll develop a more intuitive understanding of what makes modern AI document interfaces work, preparing you to build more sophisticated applications in the future. The field is evolving rapidly, but the fundamental concepts you've learned here will remain relevant as AI continues transforming how we interact with information.
Subscribe to my newsletter
Read articles from Spheron Network directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Spheron Network
Spheron Network
On-demand DePIN for GPU Compute