Implementing a Simple RAG Pipeline in Python


🚀 Day 33 of #100DaysOfAIEngineering
Introduction
In today’s edition of #100DaysOfAIEngineering, we are implementing a Retrieval-Augmented Generation (RAG) pipeline using LangChain in Python. RAG combines information retrieval with text generation to provide more contextually aware and factually accurate AI-generated responses. Our goal is to build a simple yet effective RAG system that can analyze a PDF and allow users to query its contents interactively.
This guide will walk you through setting up your environment, processing PDFs, embedding text chunks, storing them in a vector database, and using Google Gemini for intelligent Q&A interactions.
Step 1: Setting Up the Environment
Before implementing the RAG pipeline, we need to install the necessary dependencies. Open Google Colab and run the following:
!pip install langchain pypdf sentence-transformers faiss-cpu google-generativeai
!pip install PyMuPDF langchain_community
!pip install python-dotenv langchain_google_genai
To manage API keys securely, set up a virtual environment and store the API key in an .env file:
!virtualenv env
!source env/bin/activate
!echo "API_KEY=YOUR_KEY_" > env/.env
Load the API key using dotenv:
import os
from dotenv import load_dotenv
load_dotenv("env/.env")
print(os.getenv("API_KEY"))
Step 2: Loading and Processing the PDF
We use PyMuPDF (fitz) to extract text from the uploaded PDF and split it into manageable chunks for efficient retrieval.
from google.colab import files
import fitz
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
pdf = files.upload()
pdf_path = list(pdf.keys())[0]
doc = fitz.open(pdf_path)
full_text = "\n".join([page.get_text("text") for page in doc])
documents = [Document(page_content=full_text)]
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)
print(f"Processed {len(chunks)} text chunks.")
This prepares the text for embedding and retrieval.
Step 3: Embedding and Storing the Chunks
We use Hugging Face’s sentence-transformers
model to generate vector embeddings and store them using FAISS, an efficient vector search library.
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embedding_model)
vector_store.save_local("faiss_index")
This enables quick similarity searches on the embedded text.
Step 4: Setting Up the RAG Pipeline
We integrate Google Gemini as our LLM and connect it to the vector retriever. Conversational memory helps maintain context across multiple queries.
import google.generativeai as genai
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
retriever = vector_store.as_retriever()
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", google_api_key=os.getenv("API_KEY"))
qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)
Step 5: Interactive Chat with the PDF
Finally, we create a chatbot interface where users can ask questions about the PDF.
while True:
query = input("\nTalk To The PDF: ")
if query.lower() in ["exit", "quit"]:
print("Exiting conversation. Goodbye!")
break
response = qa_chain.run({"question": query, "chat_history": memory.load_memory_variables({})["chat_history"]})
print("\n🤖 AI:", response)
This allows users to interact with the uploaded PDF dynamically, retrieving insights and generating context-aware answers.
Key Takeaways
✅ Retrieval-Augmented Generation (RAG) enhances AI responses by combining retrieval with generation.
✅ FAISS enables efficient document searching by storing text embeddings.
✅ Google Gemini provides intelligent and context-aware responses.
✅ LangChain simplifies the integration of document retrieval and LLM-based Q&A systems.
✅ Embedding citations and refining outputs improves the accuracy and reliability of responses.
Today, we built a fully functional RAG pipeline in Python using LangChain, Hugging Face, FAISS, and Google Gemini. This system can process PDFs, embed and store information, retrieve relevant passages, and generate intelligent responses based on user queries.
Here’s the Google Colab Notebook
In the next edition of #100DaysOfAIEngineering, we will explore fine-tuning RAG models and handling edge cases like ambiguous queries, hallucinations, and multi-document summarization.
Stay tuned! 🚀
Subscribe to my newsletter
Read articles from Paul Fruitful directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Paul Fruitful
Paul Fruitful
Skilled and results-oriented Software Developer with more than 5 years of experience working in a variety of environments with a breadth of programs and technologies. I am open and enthusiastic about ideas, solutions and problems. I am proficient with PHP, JavaScript and Python and I am well vast in a lot of programming and computing paradigms.