Chatbots are valuable tools for giving customers and employees access to internal documents and information. When powered by a large language model (LLM), a chatbot can enhance the user experience and minimize the need for human involvement. However, LLMs have limitations, as they rely on pre-trained data and may not always provide contextually relevant responses.

To overcome these limitations, businesses can implement the Retrieval-Augmented Generation (RAG) technique. RAG combines retrieval-based and generative AI models to deliver more accurate and context-aware responses. It works by representing document content and user queries as vector embeddings, which are then retrieved and passed to an LLM to generate responses with enhanced context.

In this article, I will deploy a PDF chatbot application that utilizes retrieval-augmented generation to answer queries based on embeddings created from PDF documents. The chatbot will use the LangChain framework and Amazon Bedrock to generate embeddings and responses. The user interface will be built with Streamlit, and the application will be deployed as an Amazon ECS Service. You will configure the system and interact with the chatbot to evaluate its functionality.

Introduction

In this step, you will configure the AWS credentials needed to use the AWS SAM CLI.

Instructions

Use instructions her to download and setup your AWS account for using the SAM CLI

https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html

In the terminal, enter the following set of commands to configure the AWS account credentials:

 aws configure set aws_access_key_id AKIA4AEEWDFAKDIDBFJEA &&
 aws configure set aws_secret_access_key yadgahdadsdysdsdbsdjsdndjcjs &&
 aws configure set default.region us-west-2

Note : Replace with your own access key and secret access key

Enter aws configure list to confirm the credentials have been set correctly:
Before moving on to the next lab step, download the project folder here;

Any directory and files referenced in this lab will appear in this folder

Deploying the PDF Embedding Solution

Introduction

In this step, you will deploy the PDF embedding solution using AWS SAM. At a high level, the solution extracts metadata from PDF documents and generates embeddings using LangChain and Amazon Bedrock.

The PDF chatbot application will use the embeddings to answer user prompts based on the content of the PDF documents.

The following diagram shows the architecture of the embedding solution:

embedding-solution

The solution consists of the following components:

An S3 bucket to store the PDF documents
A DynamoDB table to store the metadata and status of the uploaded documents
A Lambda function to extract metadata from the PDF documents
A Lambda function to generate embeddings from the PDF documents
An SQS queue to store the PDF document processing requests

The embedding process is summarized below:

A PDF document is uploaded to the S3 bucket and an S3 event triggers the ExtractMetadata Lambda function.
The ExtractMetadata Lambda function extracts metadata from the PDF document and sends a message to the SQS queue with the metadata.
The GenerateEmbeddings Lambda function reads the message from the SQS queue and generates embeddings from the PDF document using LangChain and Amazon Bedrock. The embeddings are stored in the same S3 bucket.

Instructions

Download the source project folder here
Create an s3 bucket withe name of = “deploy-assets-[random five letters]” and allow all public access

Click the Explorer tab to open the file explorer:

Open the samconfig.toml file in your editor, then paste in the following configuration:

 version=0.1
 [default.global.parameters]
 stack_name = "ChatbotStack"

 [default.deploy.parameters]
 region = "us-west-2"
 s3_bucket = "deploy-assets-*****"
 s3_prefix = "ntbk-lm"
 confirm_changeset = true
 capabilities = "CAPABILITY_IAM"
 tags = "project=\"ntbk-lab\" stage=\"development\""

The configuration above provides the AWS SAM CLI with the deployment parameters for your SAM application.

In the terminal, run the following command to deploy the application:
```
 sam deploy --resolve-image-repos
```
Enter y when prompted to confirm the deployment:

The deployment may take up to two minutes to complete.

Stepping Through the PDF Chatbot Application Code

Introduction

In this step, you will walk through the Streamlit application code that provides a user interface and logic for the PDF chatbot application.

The application code has already been built and packaged into a Docker image. The Docker image will be deployed to an Amazon ECS Service using the AWS Serverless Application Model (SAM) CLI.

Conversational Retrieval Chains are used to create chatbots that can interact with PDF documents. These chains also allow for follow-up questions and context-aware responses, making it a viable solution for RAG-based chatbots.

Retrievers are LangChain components that retrieve documents from a vector store based on a prompt. The vector store, or index, for each uploaded PDF file, is created using the embedding solution in the previous lab step. The index files will be loaded into the chatbot application's memory and used to retrieve relevant documents based on the user prompt.

Instructions

Expand the src/pdf_chatbot directory in the Explorer tab and open the Main.py file:

This file contains the Streamlit application code that will be deployed to the ECS Service.

The streamlit framework is used to render the user interface and interact with the chatbot application. The langchain framework will be along with boto3 to interact with the Amazon Bedrock service.

The notable LangChain methods used in the application include:
- LangChain Hub: LangChain Hub is a version control system for LLM prompts. You can import prompts to use in your applications. You will use a RAG prompt to generate responses for the chatbot.
- BedrockLLM: LangChain BedrockLLM is used to generate responses for the chatbot. The amazon.titan-text-express-v1 model will be used to generate text responses for the chatbot.
- BedrockEmbeddings: This class generates embeddings for the uploaded PDF documents. For this lab, the amazon.titan-embed-text-v1 model will be used to generate embeddings for the uploaded documents.
- FAISS: This class is used as the vector store to store the embeddings and create the index. FAISS is a library for efficient similarity search and clustering of dense vectors.
- ConversationalRetrievalChain: A conversational retrieval chain uses a vector store to represent the documents in the embedding space and uses a retrieval model to retrieve the most relevant documents based on the query.
- LangChain Debugging: Setting the set_debug method to True will enable debugging mode for the LangChain framework.

The bucket_name variable references the S3 bucket that stores the PDF documents and their embeddings.

Three helper functions are defined in the Main.py file:

upload_to_s3: This function iterates through a list of objects and uploads them to the S3 bucket. The st.toast method is a Streamlit method that displays a message banner at the top of the Streamlit application.
stream_response: This function simulates a streaming response in the Streamlit application.

The generate_chat function contains the main logic for the chatbot application. This function employs the retrieval-augmented generation technique to generate responses for the chatbot. It accepts a user prompt and a file_name to begin the conversation. The function performs the following steps:

36-37: The embedding solution referenced in the previous lab step creates a folder with the index files for each document. The index files generated for the uploaded PDF documents are loaded into memory.
40-44: The BedrockLLM is initialized with the amazon.titan-text-express-v1 model and AWS credentials. The Docker image used to deploy this application will have IAM roles attached to it through the Amazon ECS task role, removing the need to hardcode the AWS credentials.
47-51: The BedrockEmbeddings class is initialized with the amazon.titan-embed-text-v1 model. This embeddings object will be used to embed user prompts.
52-57: The index.faiss and index.pkl files are loaded into a FAISS index object. The object can then be used as a retriever to fetch documents from the index vector store relevant to the user prompt.
60: The hub.pull method is used to pull the RAG prompt from the LangChain Hub. The rlm/rag-prompt is used to generate responses for the chatbot.
63-68: The ConversationalRetrievalChain is initialized with the llm, retriever, and rag_prompt. The return_source_documents parameter is set to False to return the generated response only.
71-72: The conversation chain is invoked by passing in the question and chat_history arguments. Chat history is used to maintain the context of the conversation, however, it is not used in this lab.

The remaining code in the application is responsible for rendering the Streamlit application and handling user interactions:

The Chatbot application will utilize a Streamlit sidebar to upload and select PDF documents from the Amazon S3 bucket.

84-87: The first container in the sidebar uses the st.file_uploader method to upload one or more PDF documents to the S3 bucket. The upload_to_s3 helper function is called to upload the documents to the S3 bucket when the Upload button is clicked.
90-100: The second container in the sidebar calls the s3.list_objects_v2 method to retrieve all the objects in the bucket. The object list is filtered to only include the PDF file names.
102: The st.selectbox method renders a select box to display a list of uploaded PDF document names. The selected_obj variable stores the selected PDF document name.

Although the S3 bucket contains the original PDF documents and their embeddings, the generate_chat function only requires the PDF name. Remember that the index.faiss and index.pkl files are identified by the PDF name. For example, random.pdf/index.faiss and random.pdf/index.pkl.

107: If the selected_obj is None, a caption is rendered to the user to select a PDF file.
109-112: Once a file is selected, the chat_history state is initialized in the Streamlit session state.
114-117: To render the chat history to the screen, this for loop will iterate through each message in the session state chat_history and render the message role and content to the screen. As you and the chatbot interact, the chat history will be updated and rendered in the application. Each message object in the chat_history state contains a role, content, and file attribute. The file attribute is used to identify the PDF document that the message is associated with.
119-120: The st.chat_input method will render a text input box for you to enter a prompt. The if prompt := ... syntax assigns the prompt variable and checks if the prompt is not empty. When you enter a prompt, a message object will be initialized and appended to the chat_history state.
122: The chatbot will render your input prompt to the screen with the st.write method.
125: The generate_chat function is called with the prompt and selected_obj arguments.
127: The payload is passed to the stream_response helper function then the response is rendered to the screen with the st.write_stream method.
130: The chatbot will initialize and append the assistant role's response to the chat_history session state with the payload as its content.

Open the pdf_chatbot/Dockerfile file in the editor:

The Dockerfile contains the instructions to build the Docker image for the Streamlit application. The Docker image will be built with the Python dependencies defined in the pdf_chatbot/requirements.txt file.

The python:3.9-slim base image is used to build the Docker image. Beginning in the /app working directory, the curl package is installed. The COPY . . command copies the application code and dependencies into the environment, and the pip install -r requirements.txt command installs the Python dependencies.

A tmp directory is created to store the PDF document embeddings retrieved from the S3 bucket. The 8501 port is exposed to allow traffic to the Streamlit application and a HEALTHCHECK is defined to check the health of the application.

The ENTRYPOINT command specifies the command that will run when the Docker container is started. The streamlit run Main.py command will start the Streamlit application.

Next, you will configure the Amazon Elastic Container Service (ECS) resources in the template.yaml file.

Launching the Streamlit Application Using AWS Fargate

In this step, you will deploy the Streamlit application to Amazon ECS using AWS SAM.

Instructions

In the template.yaml file, add the following parameters above the Resources section:
```
 Parameters:
   ImageURI:
     Type: String
     Description: The URI of the image to deploy
     Default: "824911993206.dkr.ecr.us-west-2.amazonaws.com/pdf-chatbot-lab:latest"

   VPC:
     Type: String
     Default: "vpc-0cb9d42cfe8a023fb"

   PublicSubnet:
     Type: String
     Default: "subnet-0269991e2b1402394"
```
The Parameters section should be aligned with the Resources section in the template.yaml file:

The ECS cluster will be deployed into the default VPC and public subnet. The VPC and related resources have been created for you at the start of the lab.

The ImageURI parameter specifies the URI of the Docker image to deploy. In the interest of time, you will not build the Docker image in this lab. Instead, you will use a pre-built image.

Add the following resources to the Resources section at the bottom of the template.yaml file:

   ECSSecurityGroup:
     Type: AWS::EC2::SecurityGroup
     Properties:
       GroupDescription: Allow streamlit app traffic
       VpcId: !Ref VPC
       SecurityGroupIngress:
         - CidrIp: 0.0.0.0/0 
           IpProtocol: tcp
           FromPort: 8501
           ToPort: 8501
           Description: Streamlit app port
         - CidrIp: 0.0.0.0/0 
           IpProtocol: tcp
           FromPort: 80
           ToPort: 80
           Description: HTTP port
         - CidrIp: 0.0.0.0/0 
           IpProtocol: tcp
           FromPort: 443
           ToPort: 443
           Description: HTTPS port
       SecurityGroupEgress:
         - CidrIp: 0.0.0.0/0 
           IpProtocol: "-1"
           Description: Allow all outbound

   ECSCluster:
     Type: AWS::ECS::Cluster
     Properties:
       ClusterName: streamlit-cluster

   ECSService:
     Type: AWS::ECS::Service
     Properties:
       Cluster: !Ref ECSCluster
       DesiredCount: 1
       TaskDefinition: !Ref ECSTaskDef
       LaunchType: FARGATE
       ServiceName: streamlit-service
       NetworkConfiguration:
         AwsvpcConfiguration:
           AssignPublicIp: ENABLED
           SecurityGroups:
             - !Ref ECSSecurityGroup
           Subnets:
             - !Ref PublicSubnet
       DeploymentConfiguration:
         MaximumPercent: 200
         MinimumHealthyPercent: 100
         DeploymentCircuitBreaker:
           Enable: true
           Rollback: true
       DeploymentController:
         Type: ECS
       ServiceConnectConfiguration:
         Enabled: false

   ECSTaskDef:
     Type: AWS::ECS::TaskDefinition
     Properties:
       RequiresCompatibilities:
         - FARGATE
       Cpu: 1024
       Memory: 2048
       NetworkMode: awsvpc
       Family: streamlit-task-definition
       ExecutionRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/ECSExecutionRole"
       TaskRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/ECSTaskRole"
       ContainerDefinitions:
         - Name: streamlit
           Image: !Ref ImageURI 
           PortMappings:
             - ContainerPort: 8501
               Protocol: tcp
               HostPort: 8501
               AppProtocol: http
               Name: streamlit-8501-tcp
           Essential: true
           LogConfiguration:
             LogDriver: awslogs
             Options:
               awslogs-create-group: true
               awslogs-group: "/ecs/streamlit-task-definition"
               awslogs-region: !Ref AWS::Region
               awslogs-stream-prefix: ecs
           Environment:
             - Name: BUCKET_NAME
               Value: !Ref DocumentBucket

Ensure the new resources are aligned with the existing resources in the template.yaml file:

The Amazon ECS resources serve the following purposes:

ECSSecurityGroup: This resource defines the security group for the ECS service. It allows traffic on ports 8501, 80, and 443. The security group is associated with the ECS service which will allow traffic to the Streamlit application.
ECSCluster: This resource defines the ECS cluster where the ECS service will be deployed.
ECSService: This resource defines the ECS service that serves the Streamlit application. It specifies the desired count of tasks, the task definition to use, the launch type, the network configuration, and the deployment configuration. It is set to use the FARGATE launch type with a desired count of 1. The NetworkConfiguration specifies the public subnet to deploy the service into, along with the AssignPublicIp value set to ENABLED to assign public IP addresses to the tasks.
ECSTaskDef: This resource defines the ECS task definition used by the ECS service to deploy tasks. The ExecutionRoleArn is an IAM role that grants the ECS service permissions to pull the Docker image from Amazon ECR. The TaskRoleArn is an IAM role that grants the task permissions to access the S3 bucket and any permissions required by the chatbot application. The ContainerDefinitions section specifies the PortMappings and LogConfiguration for the task. The port required by Streamlit is 8501 and the ECS task logs will be stored in the /ecs/streamlit-task-definition CloudWatch Log Group. The Environment section specifies the BUCKET_NAME variable passed into the Streamlit application.

In the terminal, run the following command to deploy the application:
```
 sam deploy --resolve-image-repos
```
Enter y when prompted to confirm the deployment:

The update to the stack will take 3 to 4 minutes to complete.

Interacting With the PDF RAG Chatbot

In this step, you will interact with the PDF chatbot by uploading a PDF file and asking the chatbot a question. The chatbot will use the embeddings generated from the PDF file to provide answers to your questions.

Instructions

In the AWS Console, navigate to the Amazon ECS cluster page:
Click the streamlit-cluster name to view the cluster details:

This will take you to the details page, where you can access the ECS service, tasks, and other resources associated with the cluster.
Below the Cluster overview, select the Tasks tab:

The Tasks table should contain one task with the Running status.
Click the Task ID of the running task to view the task details.

This will take you to the task details page, where you can access the task logs and other information.
Scroll down to the Configuration section and copy the Public IP address of the task:
In a new browser tab, paste in the Public IP and append :8501 to the end of the URL to access the Streamlit application:

Example: http://54.202.1.112:8501
Download the sample PDF file below:
- Sample1.pdf

This PDF file contains a list of random, distorted facts about various topics. You will use this file to test the chatbot's ability to answer questions based on the content of the PDF file.

In the Upload a PDF file section in the sidebar, drag the downloaded PDF file into the drop zone or click Browse files to select the file:
Click Upload to upload the PDF file.

The chatbot will process the PDF file, generate embeddings for the document, and display the following banner when the processing is complete:
Select the Sample1.pdf file from the Select a PDF file dropdown:

Below the Chat with a PDF file header, a chat input box will appear once a file is selected:
Enter a question into the chat input field and press Enter to submit the question to the chatbot.

To test whether the chatbot can answer questions based on the content of the PDF file, try asking one of the following questions related to the sample PDF file:
- What bear is best?
- Give me a random fact about the presidents of the United States.
- Today is Monday, what color is the sky?
- What does AWS stand for?

The chatbot should return a distorted fact, rather than an accurate answer generated by the Amazon Titan model, demonstrating the RAG technique in action:

Optional: Upload additional PDF files and ask the chatbot questions based on the content of the uploaded files.

Summary

By completing this tutorial, you have successfully:

Employed the Retrieval-Augmented Generation (RAG) technique to generate answers to questions based on embeddings from a PDF document
Deployed a PDF chatbot application to an Amazon ECS service

Note; replace all arns with the arns for your roles and resources.

Building a NotebookLM mini-clone Powered by Amazon Bedrock

Introduction

Instructions

Deploying the PDF Embedding Solution

Introduction

Instructions

Stepping Through the PDF Chatbot Application Code

Introduction

Instructions

Launching the Streamlit Application Using AWS Fargate

Instructions

Interacting With the PDF RAG Chatbot

Instructions

Summary

Subscribe to my newsletter

Olorode Oluwadurotimi

Olorode Oluwadurotimi