Build an AI-powered pregnancy food safety web app with Amazon Bedrock and Amazon Textract


AI is on everyone’s mind these days as we see it penetrate and change our lives. As an IT professional, what doesn’t change is the constant need to learn new technologies to stay relevant and build innovative solutions. To help myself learn AI using AWS, I have built an AI-powered web app project that I would like to share.
AWS has a number of services that you can use to build AI powered apps and you don’t need to be an AI expert to use them. In this blog post I will showcase the use of two AWS AI services, Amazon Bedrock and Amazon Textract to build a pregnancy food safety web app. The web app infrastructure is hosted in AWS and is completely serverless. For those who are wanting to try the web app themselves and have a deeper look in to the solution, I have also provided Terraform code to deploy the infrastructure with the application code.
Web app use case: pregnancy food safety
In my AI learning exercise, I needed to come up with a project use case where I could apply practical usage of AWS AI services. The use case I had come up with is a web app that you can use to take photos of food products with the list of ingredients visible. The web app will then return an analysis of the safety of the ingredients for a pregnant person.
I chose this use case because it presented several problems which AI can be used to solve. They are:
The extraction of text from a photograph image.
The identification of text that are ingredients.
Analysis of the safety of the ingredients for pregnancy.
Below are screenshots of the web app.
The left image shows the initial web app landing page where the user can choose “Take a Photo” which will allow users take a photo of a food product ingredients with their device’s camera.
The middle image shows a waiting screen after the photo image has been uploaded.
The right image shows the returned results containing the extracted ingredients from the image and the pregnancy food safety analysis of the ingredients.
AWS infrastructure
The diagram below shows the AWS infrastructure used for the AI-powered pregnancy food safety web app.
The numbered annotations in brown circles show the sequence flow for using the web app. These are further explained below:
A user’s web browser retrieves the web app static content served by an Amazon CloudFront distribution with an Amazon S3 bucket origin. The web app is a single page application that uses two REST API calls (further described in the next steps).
After the user take’s a photo, a REST API call (HTTP POST) is made to an Amazon API Gateway. A Lambda function then handles the API request by generating a pre-signed URL to upload the photo image file to an S3 Bucket. The pre-signed URL is returned in the API response.
The pre-signed URL is used to upload the photo image file to an “images” S3 Bucket.
The S3 Bucket is configured to send a S3 Event Notification when a image file is uploaded (
s3:ObjectCreated
). The S3 Event Notification is used to invoke an AWS Lambda function to perform the inference task.The inference Lambda function first extracts the ingredients text from the image file (further covered in Ingredients text extraction from images). The solution provides two AWS AI service options to do this:
Amazon Textract service uses a machine learning approach to extract the text from the image.
Amazon Bedrock service uses a generative AI approach to extract the text from the image. The foundation model Amazon Nova Lite which supports multimodal image and text input is used.
The inference Lambda function then uses Amazon Bedrock to invoke a model with a templated prompt to analyse the safety of the extracted ingredients (further covered in Analysing pregnancy food safety of ingredients with Amazon Bedrock). The foundation model used is Amazon Nova Micro.
The inference Lambda function stores the extracted ingredients and the food safety analysis result in to a DynamoDB table.
The web app makes polled REST API calls (HTTP GET) to the Amazon API Gateway to obtain the pregnancy food safety analysis results. A polling approach is used because it can take some time (around 20 seconds) for the inference result (pregnancy food safety analysis of ingredients) to be available. A Lambda function handles the API request by querying the inference result from the DynamoDB table. Once the API response returns the inference result, the web app can cease polling.
Deploying the AI-powered pregnancy food safety web app with Terraform
The AI-powered pregnancy food safety web app is available as a Terraform deployable project here:
https://github.com/Freddy-CLH-Blog/ai-powered-food-safety-app-demo
Prerequisites
To deploy the AWS infrastructure for the web app, you will require:
Administrator access to an AWS account.
Terraform installed and basic familiarity with using Terraform with AWS.
The web app frontend content and backend AWS Lambda function code is already pre-built and committed to the project. However, if you want to make changes and rebuild these components yourself you will also need:
For the web app frontend content:
Node.js (v22) installed.
Basic familiarity with Vite project and vue.js framework.
For the Lambda functions (API handler and inference):
- Python 3.10+ installed.
Request for Amazon Bedrock model access
Before you can use foundation models in Amazon Bedrock, you must first request access to them. The AI-powered pregnancy food safety web app uses the following foundation models:
Amazon Nova Micro
Amazon Nova Lite
You can request for model access through the Amazon Bedrock Console under Configure and learn > Model access. Instructions on how to do this are available in Amazon Bedrock official documentation here.
Ensure you request for access to Amazon Nova Micro and Amazon Nova Lite. Feel free to request for other models as you wish. Note that requesting access to some 3rd party models such as Anthropic will require you to provide additional use case details.
Deploy with Terraform
Follow these steps to deploy the AWS Infrastructure for the AI-powered pregnancy food safety web app.
Clone the project:
git clone https://github.com/Freddy-CLH-Blog/ai-powered-food-safety-app-demo.git
Create a terraform.tfvars
file at the project directory with the contents like the following:
region = "ap-southeast-2"
text_extractor_mode = "nova-lite" # textract | nova-lite
enable_web_app_waf = true
enable_api_waf = true
cidr_allowlist = ["203.0.113.1/32"] # Change to your source IP range
The Terraform variables used in terraform.tfvars
are explained as follows:
region
- the AWS region to deploy to.text_extractor_mode
- controls which AWS AI service is used to perform the extraction of the ingredients text. Set to"textract"
to use Amazon Textract. Set to"nova-lite"
to use Amazon Bedrock and the Amazon Nova Lite model.enable_web_app_waf
andenable_api_waf
- are boolean variables used to control deploying of an AWS WAF to restrict access by IP to the deployed web app and API respectively. It is recommended to set this totrue
to restrict access to the web app because it is only for demonstration purposes and not intended for production level public access.cidr_allowlist
- is used provide a list of IP ranges in CIDR notation that you want to allow access to the web app. You should update this list to have your source IP range that you will be using to access the web app.
Deploy using Terraform:
# Assumed using credentials from AWS profile
export AWS_PROFILE="CHANGE-TO-YOUR-AWS-PROFILE"
# Initialise Terraform project
terraform init
# Checks
terraform validate
terraform plan
# Deploy
terraform apply
After a successful deployment, the following example Terraform outputs should be returned:
api_endpoint = "https://abcdefgh01.execute-api.ap-southeast-2.amazonaws.com/prod"
web_app_files_sync_to_s3_command = "aws s3 sync assets/web-app/dist/ s3://safe-ingredients-webapp12345 --delete"
web_app_invalidate_cache_command = "aws cloudfront create-invalidation --distribution-id ABCD1234EFG567 --paths '/*'"
web_app_s3_bucket_uri = "s3://safe-ingredients-webapp12345"
web_app_url = "https://abcdefgh01.cloudfront.net/index.html"
To deploy the web app static content to the S3 bucket use the command from the Terraform output web_app_files_sync_to_s3_command
, for example:
aws s3 sync assets/web-app/dist/ s3://safe-ingredients-webapp12345 --delete
To open the web app in your device web browser, navigate to the URL provided by the Terraform output web_app_url
.
Ingredients text extraction from images
The web app requires a solution to extract the ingredients text from photo images of food products. This can be broken down in to two individual parts:
Extracting text from image
Identifying the text that is ingredients.
The photo images of food products typically will have ingredients text grouped together and alongside the word “ingredients”. Food products also tend contain other text such as the name, description, nutrition information, etc.
AWS has several AI services which could be potentially used to extract text from images. The web app that I have shared can switch between using two AWS services being either Amazon Textract or Amazon Bedrock with the Amazon Nova Lite model.
Another AWS service that could have been used is Amazon Rekognition, however I opted not to use this because it is a service that is more targeted towards detecting features in real world scenarios. Amazon Rekognition text identification is limited to 100 words in an image and does not understand document structure. Typically a food product image will have ingredients text grouped together. Amazon Rekognition by itself would not be able to understand and return this grouping of ingredient text.
After testing my web app in both Amazon Textract and Amazon Bedrock + Nova Lite, I noticed some differences in how they perform which I describe below.
Using Amazon Textract for ingredients text extraction
Amazon Textract uses machine learning to perform text extraction. Amazon Textract can also understand and return the document structure and relationship of words with lines and pages. In addition, a lesser known feature of Amazon Textract is the ability to provide a question using natural language to obtain specific information of the extracted text. This feature is provided by using queries in the Amazon Textract - Analyze Document operation.
The screenshot below shows demo usage of the queries in Amazon Textract - Analyze Document through the AWS Console. A image of a food product in uploaded and the query “What are the ingredients? is given.
The web app solution uses the inference Lambda function to call the Amazon Textract AnalyzeDocument API (using Python Boto3 at this python file location).
There were some inaccuracies encountered using queries in Amazon Textract - Analyze Document to extract the ingredients text:
Food products with long ingredients text tend to not be fully extracted. The Amazon Textract query result would sometimes have the tail end of the long ingredients text cut short.
Food products can list ingredients in different languages (e.g. English and Spanish). In some trials the Amazon Textract query result would return the non-English list of ingredients.
Some suggested actions to improve the accuracy of Amazon Textract (not covered in this blog post) include:
Amazon Textract Adapters could be used to improve the query response, tailored to our use case of extracting the ingredients. We can train the adapter by providing food product images annotated with ingredients as the training data.
Amazon Textract could also be used just to extract the raw text without identifying which text are ingredients. Another service such as Amazon Bedrock or other machine learning model with Amazon SageMaker could then be used to identify the ingredients from the Amazon Textract raw text output.
The usage cost of Amazon Textract Analyze Document Queries for extracting ingredients from 1000 images (in the Asia Pacific Sydney region) is $15. This cost is significantly higher than the Amazon Bedrock with Amazon Nova Lite solution (estimated $0.14 for 1000 images and described in the next section). Amazon Textract pricing is available here.
Using Amazon Bedrock with Amazon Nova Lite model for ingredients text extraction
Amazon Bedrock is a service that allows you to use wide range of foundation models from many industry leading AI model providers. Amazon Bedrock makes it easy to invoke a model by making a InvokeModel API call with your prompt inputs. Amazon Bedrock requires minimal configuration and there is no infrastructure to manage. You simply need to request access to the foundation model (see Request for Amazon Bedrock model access) that you want to use beforehand.
To extract the ingredients text from the images we need to use a multimodal model that supports text and image inputs and provides text output. The model I chose to use is Amazon Nova Lite due to it being very low cost and fast.
The web app solution uses the inference Lambda function to call the Amazon Bedrock InvokeModel API (using Python Boto3 at this python file location). The text prompt used to invoke the model is below:
system_prompt = (
"You are to help extract the ingredients from images of food products. "
"When the user provides you with an image, only list the ingredients."
)
prompt = "What are the ingredients?"
Amazon Bedrock + Nova Lite tended to work slightly better than the Amazon Textract option especially for food products with longer lists of ingredients. However, there were some problematic responses that were observed:
Hallucination of ingredients that are not actually listed on the food product. This may be due to the model inferring close similarities to ingredient that were present, or from other text/images on the food product.
A repeating loop of the same ingredients. This occurs when the model predicts the next item in a sequence based on a strong pattern learnt from it’s training data.
It is interesting to observe the differences in the inaccurate/incorrect ingredient text extraction between Amazon Textract and Amazon Bedrock + Nova Lite which I believe arise due to the fundamental approach of how these AI services work. Amazon Textract uses a “old-fashion AI” optical character recognition which doesn’t hallucinate nor get in to predicative repeating loops. Whereas, the Amazon Nova Lite model is a large language model (LLM) which undertakes reason based understanding approach of the image and prompt.
Some suggested actions to improve the responses using Amazon Bedrock (not covered in this blog post) include:
Refining the prompt or tweaking the inference parameters (e.g. temperature, topP, topK).
Try using other larger more intelligent foundation models available in Amazon Bedrock.
Fine tuning the selected models by providing training data.
For estimating the cost of using Amazon Bedrock with the Amazon Nova Lite model, we assume a food product photo image input is approximately 1700 tokens and the model response output is 130 tokens. The cost for invoking the model (to extract ingredients text) from 1000 images (in the Asia Pacific Sydney region) would then be $0.14. This is obtained from the following calculation:
1000 images x ( 1700 input tokens x PricePerInputToken + 130 output token x PricePerOutputToken)
The Amazon Bedrock pricing page is available here.
For more details on using Amazon Bedrock and Amazon Nova models, continue reading.
Analysing pregnancy food safety of ingredients with Amazon Bedrock
The core feature of the web app is to analyse the safety of ingredients of a food product for pregnancy. This problem requires generative AI to solve. Amazon Bedrock is once again used to tackle this problem. For my use case of analysing the safety of the extracted ingredients text for pregnancy, I opted to use Amazon Nova Micro text-only model because it is very low cost and has low latency.
We have covered using Amazon Bedrock and the Amazon Nova Lite model for ingredient text extrication previously (here). Now I would like dive a bit deeper in how the Amazon Bedrock and the Amazon Nova model family is used.
To invoke a model with Amazon Bedrock, we simply need to call the InvokeModel API which only requires three input request parameters. An example usage implemented with the Python AWS SDK boto3 is below:
import boto3
import json
client = boto3.client("bedrock-runtime")
response = client.invoke_model(
modelId="amazon.nova-micro-v1:0",
contentType="application/json",
body=json.dumps(native_request),
)
print(json.loads(response["body"].read()))
The modelId
is where we put the ID of the used model, Amazon Nova Micro.
The contentType
must be set to “application/json”.
The body
is where where we provide the request to invoke our model as a JSON string. The body request schema is dependent of the model that is used.
An example request (native_request
) using the Amazon Nova model for our food safety use case is below. You can find the Amazon nova complete request schema here.
ingredients = "Green peas, Glutinous Rice Flour, Corn Starch, Sugar, Salt"
system_prompt = (
"You are a medical assistant. Your task is to evaluate food ingredients for safety during"
" pregnancy. Avoid conversational language. "
"Your response should group the ingredients in to one to three lists by their degree of "
"safety risks. "
"For ingredients that are well known for their risk during pregnancy, provide a medical "
"reason with moderate detail. "
"For ingredients that are well known to be low risk during pregnancy, provide a brief and "
"consise medical reason. "
"For ingredients that are not well known for their risk during pregnancy, provide a brief "
"and consise medical reason. "
"Provide the list with the highest risk ingredients first. "
"Conclude by stating the overall safety of the ingredients for pregnancy. "
"Provide a disclaimer that this AI generated response is not a substitute for medical "
"advice."
)
prompt = (
"Evaluate the following list of food ingredients for pregnancy safety."
"\n\n"
"Ingredients:\n"
f"{ingredients}\n"
"\n"
)
native_request = {
"system": [{"text": system_prompt}],
"messages": [{"role": "user", "content": [{"text": prompt}]}],
"inferenceConfig": {"temperature": 0.7, "topP": 0.9},
}
You can see the request contains the prompt inputs:
system_prompt
- Used to provide the role context and the response style and formatting guidance. This is also used to instruct providing the disclaimer.prompt
- The user prompt to provide the specific task.
These prompts were formed over several iterations of trial and error. With each iteration I used a fixed set of ingredients and refined the prompt to reach a desired output. To help with prompt engineering for Amazon Nova you can refer to creating precise prompts documentation here.
The request also contains the inference parameters temperature
and topP
which controls the amount of randomness and the choice of next tokens in the response. For more information about the available inference parameters, see the Amazon nova complete request schema here.
Using the Amazon Nova Micro model as-is with no additional fine tuning, I found that generally the food safety analysis results tended to list ingredients in the “high risk” category. I found this result doubtful, however I’m not pregnancy medical expert to verify. Because these responses are questionable and unverified, I shall restate again:
To improve the responses for analysing the pregnancy food safety of ingredients using Amazon Bedrock, the following could be done:
Retrieval-Augmented Generation (RAG) would be best used to look up pregnancy food safety data from trusted medical sources rather than using the models training knowledge.
Fine-tune the model with curated and verified pregnancy food safety data.
Subscribe to my newsletter
Read articles from Freddy Ho directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
