AI-Powered Banking Chatbot: Build with LangChain, LangDB.ai & RAG

In Part 1 of our series, we built a LangChain-powered conversational AI for banking FAQs using LangDB AI Gateway. Now, in Part 2, we'll integrate ChromaDB for Retrieval-Augmented Generation (RAG), enhancing the chatbot's ability to provide precise answers based on uploaded documents.

🚀 What We'll Cover:

Understanding the RAG Pipeline.
Setting up ChromaDB for vector storage.
Embedding and storing documents.
Retrieving context from documents for accurate responses.
Querying the chatbot for contextually rich answers.

💡

Find all source code and starter pack by clicking here.

💡

Don’t forget to star us⭐

Alternatively you can also follow our YouTube tutorial

https://youtu.be/jTZ00qz-O3A

🤖 What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a powerful approach that enhances the capabilities of large language models (LLMs) by providing them with external knowledge. Instead of relying solely on pre-trained knowledge, RAG retrieves relevant documents from a vector database and uses them as context to generate accurate and informed responses.

Key Components of RAG:

Retriever: Searches for relevant documents based on user queries.
Generator: Generates a response using both the retrieved context and the LLM's internal knowledge.
Memory: Retains conversation history for continuity.

💡

This approach ensures that the chatbot can answer user queries with up-to-date, domain-specific information, such as banking FAQs or interest rates.

What Are Vector Databases?

A vector database stores information as vector embeddings—numerical representations of text, images, or other data. These embeddings allow for efficient similarity searches, enabling the retriever to find the most relevant documents based on user queries.

Why Use a Vector Database like ChromaDB?

Fast Retrieval: Quickly finds relevant information, even in large datasets.
Contextual Matching: Retrieves documents based on semantic meaning, not just keywords.
Efficient Storage: Compact and scalable storage of embeddings.

💡

In our project, we'll use ChromaDB to store and retrieve banking-related documents.

How Does Embedding Work?

Embedding converts text into high-dimensional vectors that capture semantic meaning. For example, the phrases "home loan interest rate" and "mortgage rate" would have similar vector representations, enabling efficient retrieval.

Embedding Process:

Text Input: Extracted from uploaded documents (e.g., PDF FAQs).
Vectorization: Text is converted into embeddings using a model like sentence-transformers.
Storage: Embeddings are stored in ChromaDB for future retrieval.

💡

This process ensures that the chatbot can search and find relevant information based on user queries.

Setting Up ChromaDB

First, ensure ChromaDB and PyPDF is installed:

pip install chromadb pypdf

Import necessary modules and initialize ChromaDB:

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

CHROMA_DB_DIR = "chroma"

# Initialize ChromaDB and Embeddings
def initialize_chromadb():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})
    vector_store = Chroma(persist_directory=CHROMA_DB_DIR, embedding_function=embeddings)
    return vector_store

# Initialize ChromaDB
vector_db = initialize_chromadb()

What This Does:

Embeddings: Converts text into searchable vectors.
Vector Store: Stores these vectors for retrieval.
Persistence: Ensures data isn't lost after restarting the app.

Uploading and Processing PDF

Let's allow users to upload a PDF containing interest rates and banking FAQs.

st.sidebar.title("Options")
uploaded_file = st.sidebar.file_uploader("Upload PDF", type="pdf")

File Upload: Users upload PDFs via Streamlit's sidebar.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import tempfile

def process_pdf(file):
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_file_path = os.path.join(temp_dir, file.name)
        with open(temp_file_path, "wb") as temp_file:
            temp_file.write(file.getbuffer())

        pdf_loader = PyPDFLoader(temp_file_path)
        documents = pdf_loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80, length_function=len)
        chunks = text_splitter.split_documents(documents)

        return chunks

Explanation:

Text Splitting: Splits documents into 800-character chunks (with 80-character overlap).
PDF Loading: PyPDFLoader extracts content from the uploaded PDF.

Storing Document in ChromaDB

Once the document is processed, we'll convert it into vector embeddings and store it in ChromaDB.

if uploaded_file:
    user_vector_store_dir = CHROMA_DB_DIR
    user_chunks = process_pdf(uploaded_file)
    vector_db.add_documents(user_chunks)
    st.sidebar.success(f"Processed {len(user_chunks)} chunks from uploaded PDF.")

Document Conversion: Converts the text into LangChain document format.
Vector Storage: Stores the document as vector embeddings in ChromaDB.

Querying ChromaDB for Contextual Answers

Now, let's update the chatbot to search the vector store for relevant context when a user asks a question.

if send_button:
    user_input = st.session_state.user_input.strip()  # Ensure the input is not empty or just whitespace    
    if user_input:
        context = ""
        # Retrieve relevant context from ChromaDB
        try:
            search_results = vector_db.similarity_search(user_input, k=3)
            for result in search_results:
                context += result.page_content + "\n\n"
        except Exception as e:
            st.error(f"Error retrieving context from ChromaDB: {e}")

What This Does:

Similarity Search: Retrieves the top 3 relevant document chunks from ChromaDB.
Contextual Response: Uses retrieved content to generate a precise, context-aware answer.

Now, we will be able to chat with our Banking assistant freely

Whole code snippet with RAG

import os
import tempfile
from os import getenv

import streamlit as st
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
import requests
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Constants
PROMPT_TEMPLATE = """
You are a banking assistant specializing in answering FAQs about loans, interest rates, and general banking services.
If the user greets, respond with a greeting. If the user asks a question, provide an answer.
Use the following context too for answering questions:

{context}

Conversation History: 
{history}

---


Answer the question based on the above context: {query}

"""

CHROMA_DB_DIR = "chroma"
LANGDB_API_URL = "https://api.us-east-1.langdb.ai/your-project-id/v1"  # Replace with your LANGDB project id
os.environ["LANGDB_API_KEY"] = "your-api-key"

st.set_page_config(page_title="Banking Assistant", layout="wide")
st.title("Banking FAQ Assistant")
st.write("Ask questions about banking services, loan options, and interest rates!")

# Initialize ChromaDB and Embeddings
def initialize_chromadb():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", model_kwargs={'device': 'cpu'})
    vector_store = Chroma(persist_directory=CHROMA_DB_DIR, embedding_function=embeddings)
    return vector_store

# Initialize ChromaDB and LangChain LLM
vector_db = initialize_chromadb()
# Initialize LangChain LLM
llm = ChatOpenAI(
    base_url=LANGDB_API_URL,
    api_key=getenv("LANGDB_API_KEY"),
    model="gpt-3.5-turbo",  # Replace with the specific model name you are using
    timeout=10  # Add a timeout of 10 seconds
)

# Memory for conversation history
memory = ConversationBufferMemory(
    memory_key="history",
    return_messages=True,
    input_key="query",
)

# Prompt Template for LangChain
prompt_template = PromptTemplate(
    input_variables=["context", "history", "query"],
    template=PROMPT_TEMPLATE
)

# LangChain LLM Chain
chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)

st.sidebar.title("Options")
uploaded_file = st.sidebar.file_uploader("Upload PDF", type="pdf")

def process_pdf(file):
    with tempfile.TemporaryDirectory() as temp_dir:
        temp_file_path = os.path.join(temp_dir, file.name)
        with open(temp_file_path, "wb") as temp_file:
            temp_file.write(file.getbuffer())

        pdf_loader = PyPDFLoader(temp_file_path)
        documents = pdf_loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80, length_function=len)
        chunks = text_splitter.split_documents(documents)

        return chunks

if uploaded_file:
    user_vector_store_dir = CHROMA_DB_DIR
    user_chunks = process_pdf(uploaded_file)
    vector_db.add_documents(user_chunks)
    st.sidebar.success(f"Processed {len(user_chunks)} chunks from uploaded PDF.")

# Chatbox implementation
st.subheader("Chatbox")

# Container for chat messages
chat_container = st.container()

# Function to display chat messages
def display_message(message, is_user=True):
    if is_user:
        chat_container.markdown(f"<div style='text-align: right; padding: 10px; border-radius: 10px; margin: 5px;'>{message}</div>", unsafe_allow_html=True)
    else:
        chat_container.markdown(f"<div style='text-align: left; padding: 10px; border-radius: 10px; margin: 5px;'>{message}</div>", unsafe_allow_html=True)

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
with chat_container:
    for chat in st.session_state.messages:
        display_message(chat['content'], is_user=chat['is_user'])

# User Input Section
user_input = st.text_input("Enter your query:", key="user_input")
send_button = st.button("Send")

if send_button:
    user_input = st.session_state.user_input.strip()  # Ensure the input is not empty or just whitespace
    if user_input:
        context = ""
        # Retrieve relevant context from ChromaDB
        try:
            search_results = vector_db.similarity_search(user_input, k=3)
            for result in search_results:
                context += result.page_content + "\n\n"
        except Exception as e:
            st.error(f"Error retrieving context from ChromaDB: {e}")
        try:
            response = chain.run(context=context, query=user_input)
            # Update conversation memory
            st.session_state.messages.append({"role": "user", "content": user_input, "is_user":True})
            st.session_state.messages.append({"role": "assistant", "content": response, "is_user":False})
            st.rerun()
        except requests.exceptions.Timeout:
            st.error("The request to the LLM timed out. Please try again.")
        except Exception as e:
            st.error(f"Error generating response: {e}")
    else:
        st.warning("Please enter a valid query.")

Final Thoughts: Smarter Banking FAQ Chatbot with RAG

With ChromaDB integrated, our chatbot can now answer questions based on uploaded documents, ensuring accurate, contextually relevant responses. This powerful RAG pipeline makes the chatbot adaptable for real-world banking use cases.

💡 Key Takeaways:

Enhanced Accuracy: Queries are answered based on real-time context from documents.
Efficient Retrieval: ChromaDB ensures fast and relevant search results.
Seamless User Experience: Users receive precise answers without delays.

💡

Bonus: Get started with the complete source code and experiment with LangChain’s advanced features!

💡

Don’t forget to star our GitHub repo!

AI-Powered Banking Chatbot: Build with LangChain, LangDB.ai & RAG (Part 2)