Implementing a Simple RAG Pipeline in Python

Paul FruitfulPaul Fruitful
3 min read

🚀 Day 33 of #100DaysOfAIEngineering

Introduction

In today’s edition of #100DaysOfAIEngineering, we are implementing a Retrieval-Augmented Generation (RAG) pipeline using LangChain in Python. RAG combines information retrieval with text generation to provide more contextually aware and factually accurate AI-generated responses. Our goal is to build a simple yet effective RAG system that can analyze a PDF and allow users to query its contents interactively.

This guide will walk you through setting up your environment, processing PDFs, embedding text chunks, storing them in a vector database, and using Google Gemini for intelligent Q&A interactions.


Step 1: Setting Up the Environment

Before implementing the RAG pipeline, we need to install the necessary dependencies. Open Google Colab and run the following:

!pip install langchain pypdf sentence-transformers faiss-cpu google-generativeai
!pip install PyMuPDF langchain_community
!pip install python-dotenv langchain_google_genai

To manage API keys securely, set up a virtual environment and store the API key in an .env file:

!virtualenv env
!source env/bin/activate
!echo "API_KEY=YOUR_KEY_" > env/.env

Load the API key using dotenv:

import os
from dotenv import load_dotenv
load_dotenv("env/.env")
print(os.getenv("API_KEY"))

Step 2: Loading and Processing the PDF

We use PyMuPDF (fitz) to extract text from the uploaded PDF and split it into manageable chunks for efficient retrieval.

from google.colab import files
import fitz  
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

pdf = files.upload()
pdf_path = list(pdf.keys())[0]  

doc = fitz.open(pdf_path)
full_text = "\n".join([page.get_text("text") for page in doc])

documents = [Document(page_content=full_text)]

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)

print(f"Processed {len(chunks)} text chunks.")

This prepares the text for embedding and retrieval.


Step 3: Embedding and Storing the Chunks

We use Hugging Face’s sentence-transformers model to generate vector embeddings and store them using FAISS, an efficient vector search library.

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embedding_model)
vector_store.save_local("faiss_index")

This enables quick similarity searches on the embedded text.


Step 4: Setting Up the RAG Pipeline

We integrate Google Gemini as our LLM and connect it to the vector retriever. Conversational memory helps maintain context across multiple queries.

import google.generativeai as genai
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

retriever = vector_store.as_retriever()
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", google_api_key=os.getenv("API_KEY"))
qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

Step 5: Interactive Chat with the PDF

Finally, we create a chatbot interface where users can ask questions about the PDF.

while True:
    query = input("\nTalk To The PDF: ")
    if query.lower() in ["exit", "quit"]:
        print("Exiting conversation. Goodbye!")
        break

    response = qa_chain.run({"question": query, "chat_history": memory.load_memory_variables({})["chat_history"]})
    print("\n🤖 AI:", response)

This allows users to interact with the uploaded PDF dynamically, retrieving insights and generating context-aware answers.


Key Takeaways

Retrieval-Augmented Generation (RAG) enhances AI responses by combining retrieval with generation.

FAISS enables efficient document searching by storing text embeddings.

Google Gemini provides intelligent and context-aware responses.

LangChain simplifies the integration of document retrieval and LLM-based Q&A systems.

Embedding citations and refining outputs improves the accuracy and reliability of responses.


Today, we built a fully functional RAG pipeline in Python using LangChain, Hugging Face, FAISS, and Google Gemini. This system can process PDFs, embed and store information, retrieve relevant passages, and generate intelligent responses based on user queries.

Here’s the Google Colab Notebook

In the next edition of #100DaysOfAIEngineering, we will explore fine-tuning RAG models and handling edge cases like ambiguous queries, hallucinations, and multi-document summarization.

Stay tuned! 🚀

0
Subscribe to my newsletter

Read articles from Paul Fruitful directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Paul Fruitful
Paul Fruitful

Skilled and results-oriented Software Developer with more than 5 years of experience working in a variety of environments with a breadth of programs and technologies. I am open and enthusiastic about ideas, solutions and problems. I am proficient with PHP, JavaScript and Python and I am well vast in a lot of programming and computing paradigms.