Build an AI-powered pregnancy food safety web app with Amazon Bedrock and Amazon Textract

Freddy HoFreddy Ho
15 min read

AI is on everyone’s mind these days as we see it penetrate and change our lives. As an IT professional, what doesn’t change is the constant need to learn new technologies to stay relevant and build innovative solutions. To help myself learn AI using AWS, I have built an AI-powered web app project that I would like to share.

AWS has a number of services that you can use to build AI powered apps and you don’t need to be an AI expert to use them. In this blog post I will showcase the use of two AWS AI services, Amazon Bedrock and Amazon Textract to build a pregnancy food safety web app. The web app infrastructure is hosted in AWS and is completely serverless. For those who are wanting to try the web app themselves and have a deeper look in to the solution, I have also provided Terraform code to deploy the infrastructure with the application code.

Web app use case: pregnancy food safety

In my AI learning exercise, I needed to come up with a project use case where I could apply practical usage of AWS AI services. The use case I had come up with is a web app that you can use to take photos of food products with the list of ingredients visible. The web app will then return an analysis of the safety of the ingredients for a pregnant person.

I chose this use case because it presented several problems which AI can be used to solve. They are:

  1. The extraction of text from a photograph image.

  2. The identification of text that are ingredients.

  3. Analysis of the safety of the ingredients for pregnancy.

Important disclaimer: The solution for a “pregnancy food safety” web app described in this article and the given code examples should not to be used to provide valid medical advice. The solution is presented for the purposes of learning about AWS AI services only.

Below are screenshots of the web app.

  • The left image shows the initial web app landing page where the user can choose “Take a Photo” which will allow users take a photo of a food product ingredients with their device’s camera.

  • The middle image shows a waiting screen after the photo image has been uploaded.

  • The right image shows the returned results containing the extracted ingredients from the image and the pregnancy food safety analysis of the ingredients.

AWS infrastructure

The diagram below shows the AWS infrastructure used for the AI-powered pregnancy food safety web app.

The numbered annotations in brown circles show the sequence flow for using the web app. These are further explained below:

  1. A user’s web browser retrieves the web app static content served by an Amazon CloudFront distribution with an Amazon S3 bucket origin. The web app is a single page application that uses two REST API calls (further described in the next steps).

  2. After the user take’s a photo, a REST API call (HTTP POST) is made to an Amazon API Gateway. A Lambda function then handles the API request by generating a pre-signed URL to upload the photo image file to an S3 Bucket. The pre-signed URL is returned in the API response.

  3. The pre-signed URL is used to upload the photo image file to an “images” S3 Bucket.

  4. The S3 Bucket is configured to send a S3 Event Notification when a image file is uploaded (s3:ObjectCreated). The S3 Event Notification is used to invoke an AWS Lambda function to perform the inference task.

  5. The inference Lambda function first extracts the ingredients text from the image file (further covered in Ingredients text extraction from images). The solution provides two AWS AI service options to do this:

    • Amazon Textract service uses a machine learning approach to extract the text from the image.

    • Amazon Bedrock service uses a generative AI approach to extract the text from the image. The foundation model Amazon Nova Lite which supports multimodal image and text input is used.

  6. The inference Lambda function then uses Amazon Bedrock to invoke a model with a templated prompt to analyse the safety of the extracted ingredients (further covered in Analysing pregnancy food safety of ingredients with Amazon Bedrock). The foundation model used is Amazon Nova Micro.

  7. The inference Lambda function stores the extracted ingredients and the food safety analysis result in to a DynamoDB table.

  8. The web app makes polled REST API calls (HTTP GET) to the Amazon API Gateway to obtain the pregnancy food safety analysis results. A polling approach is used because it can take some time (around 20 seconds) for the inference result (pregnancy food safety analysis of ingredients) to be available. A Lambda function handles the API request by querying the inference result from the DynamoDB table. Once the API response returns the inference result, the web app can cease polling.

Deploying the AI-powered pregnancy food safety web app with Terraform

The AI-powered pregnancy food safety web app is available as a Terraform deployable project here:

https://github.com/Freddy-CLH-Blog/ai-powered-food-safety-app-demo

Prerequisites

To deploy the AWS infrastructure for the web app, you will require:

The web app frontend content and backend AWS Lambda function code is already pre-built and committed to the project. However, if you want to make changes and rebuild these components yourself you will also need:

For the Lambda functions (API handler and inference):

  • Python 3.10+ installed.

Request for Amazon Bedrock model access

Before you can use foundation models in Amazon Bedrock, you must first request access to them. The AI-powered pregnancy food safety web app uses the following foundation models:

  • Amazon Nova Micro

  • Amazon Nova Lite

You can request for model access through the Amazon Bedrock Console under Configure and learn > Model access. Instructions on how to do this are available in Amazon Bedrock official documentation here.

Ensure you request for access to Amazon Nova Micro and Amazon Nova Lite. Feel free to request for other models as you wish. Note that requesting access to some 3rd party models such as Anthropic will require you to provide additional use case details.

Deploy with Terraform

Follow these steps to deploy the AWS Infrastructure for the AI-powered pregnancy food safety web app.

Clone the project:

git clone https://github.com/Freddy-CLH-Blog/ai-powered-food-safety-app-demo.git

Create a terraform.tfvars file at the project directory with the contents like the following:

region                = "ap-southeast-2"
text_extractor_mode   = "nova-lite" # textract | nova-lite
enable_web_app_waf    = true
enable_api_waf        = true
cidr_allowlist        = ["203.0.113.1/32"] # Change to your source IP range

The Terraform variables used in terraform.tfvars are explained as follows:

  • region - the AWS region to deploy to.

  • text_extractor_mode - controls which AWS AI service is used to perform the extraction of the ingredients text. Set to "textract" to use Amazon Textract. Set to "nova-lite" to use Amazon Bedrock and the Amazon Nova Lite model.

  • enable_web_app_waf and enable_api_waf - are boolean variables used to control deploying of an AWS WAF to restrict access by IP to the deployed web app and API respectively. It is recommended to set this to true to restrict access to the web app because it is only for demonstration purposes and not intended for production level public access.

  • cidr_allowlist - is used provide a list of IP ranges in CIDR notation that you want to allow access to the web app. You should update this list to have your source IP range that you will be using to access the web app.

Deploy using Terraform:

# Assumed using credentials from AWS profile 
export AWS_PROFILE="CHANGE-TO-YOUR-AWS-PROFILE"

# Initialise Terraform project
terraform init

# Checks
terraform validate
terraform plan

# Deploy
terraform apply

After a successful deployment, the following example Terraform outputs should be returned:

api_endpoint = "https://abcdefgh01.execute-api.ap-southeast-2.amazonaws.com/prod"
web_app_files_sync_to_s3_command = "aws s3 sync assets/web-app/dist/ s3://safe-ingredients-webapp12345 --delete"
web_app_invalidate_cache_command = "aws cloudfront create-invalidation --distribution-id ABCD1234EFG567 --paths '/*'"
web_app_s3_bucket_uri = "s3://safe-ingredients-webapp12345"
web_app_url = "https://abcdefgh01.cloudfront.net/index.html"

To deploy the web app static content to the S3 bucket use the command from the Terraform output web_app_files_sync_to_s3_command, for example:

aws s3 sync assets/web-app/dist/ s3://safe-ingredients-webapp12345 --delete

To open the web app in your device web browser, navigate to the URL provided by the Terraform output web_app_url.

Ingredients text extraction from images

The web app requires a solution to extract the ingredients text from photo images of food products. This can be broken down in to two individual parts:

  • Extracting text from image

  • Identifying the text that is ingredients.

The photo images of food products typically will have ingredients text grouped together and alongside the word “ingredients”. Food products also tend contain other text such as the name, description, nutrition information, etc.

AWS has several AI services which could be potentially used to extract text from images. The web app that I have shared can switch between using two AWS services being either Amazon Textract or Amazon Bedrock with the Amazon Nova Lite model.

Another AWS service that could have been used is Amazon Rekognition, however I opted not to use this because it is a service that is more targeted towards detecting features in real world scenarios. Amazon Rekognition text identification is limited to 100 words in an image and does not understand document structure. Typically a food product image will have ingredients text grouped together. Amazon Rekognition by itself would not be able to understand and return this grouping of ingredient text.

After testing my web app in both Amazon Textract and Amazon Bedrock + Nova Lite, I noticed some differences in how they perform which I describe below.

Using Amazon Textract for ingredients text extraction

Amazon Textract uses machine learning to perform text extraction. Amazon Textract can also understand and return the document structure and relationship of words with lines and pages. In addition, a lesser known feature of Amazon Textract is the ability to provide a question using natural language to obtain specific information of the extracted text. This feature is provided by using queries in the Amazon Textract - Analyze Document operation.

The screenshot below shows demo usage of the queries in Amazon Textract - Analyze Document through the AWS Console. A image of a food product in uploaded and the query “What are the ingredients? is given.

The web app solution uses the inference Lambda function to call the Amazon Textract AnalyzeDocument API (using Python Boto3 at this python file location).

There were some inaccuracies encountered using queries in Amazon Textract - Analyze Document to extract the ingredients text:

  • Food products with long ingredients text tend to not be fully extracted. The Amazon Textract query result would sometimes have the tail end of the long ingredients text cut short.

  • Food products can list ingredients in different languages (e.g. English and Spanish). In some trials the Amazon Textract query result would return the non-English list of ingredients.

Some suggested actions to improve the accuracy of Amazon Textract (not covered in this blog post) include:

  • Amazon Textract Adapters could be used to improve the query response, tailored to our use case of extracting the ingredients. We can train the adapter by providing food product images annotated with ingredients as the training data.

  • Amazon Textract could also be used just to extract the raw text without identifying which text are ingredients. Another service such as Amazon Bedrock or other machine learning model with Amazon SageMaker could then be used to identify the ingredients from the Amazon Textract raw text output.

The usage cost of Amazon Textract Analyze Document Queries for extracting ingredients from 1000 images (in the Asia Pacific Sydney region) is $15. This cost is significantly higher than the Amazon Bedrock with Amazon Nova Lite solution (estimated $0.14 for 1000 images and described in the next section). Amazon Textract pricing is available here.

Using Amazon Bedrock with Amazon Nova Lite model for ingredients text extraction

Amazon Bedrock is a service that allows you to use wide range of foundation models from many industry leading AI model providers. Amazon Bedrock makes it easy to invoke a model by making a InvokeModel API call with your prompt inputs. Amazon Bedrock requires minimal configuration and there is no infrastructure to manage. You simply need to request access to the foundation model (see Request for Amazon Bedrock model access) that you want to use beforehand.

To extract the ingredients text from the images we need to use a multimodal model that supports text and image inputs and provides text output. The model I chose to use is Amazon Nova Lite due to it being very low cost and fast.

The web app solution uses the inference Lambda function to call the Amazon Bedrock InvokeModel API (using Python Boto3 at this python file location). The text prompt used to invoke the model is below:

    system_prompt = (
        "You are to help extract the ingredients from images of food products. "
        "When the user provides you with an image, only list the ingredients."
    )
    prompt = "What are the ingredients?"

Amazon Bedrock + Nova Lite tended to work slightly better than the Amazon Textract option especially for food products with longer lists of ingredients. However, there were some problematic responses that were observed:

  • Hallucination of ingredients that are not actually listed on the food product. This may be due to the model inferring close similarities to ingredient that were present, or from other text/images on the food product.

  • A repeating loop of the same ingredients. This occurs when the model predicts the next item in a sequence based on a strong pattern learnt from it’s training data.

It is interesting to observe the differences in the inaccurate/incorrect ingredient text extraction between Amazon Textract and Amazon Bedrock + Nova Lite which I believe arise due to the fundamental approach of how these AI services work. Amazon Textract uses a “old-fashion AI” optical character recognition which doesn’t hallucinate nor get in to predicative repeating loops. Whereas, the Amazon Nova Lite model is a large language model (LLM) which undertakes reason based understanding approach of the image and prompt.

Some suggested actions to improve the responses using Amazon Bedrock (not covered in this blog post) include:

  • Refining the prompt or tweaking the inference parameters (e.g. temperature, topP, topK).

  • Try using other larger more intelligent foundation models available in Amazon Bedrock.

  • Fine tuning the selected models by providing training data.

For estimating the cost of using Amazon Bedrock with the Amazon Nova Lite model, we assume a food product photo image input is approximately 1700 tokens and the model response output is 130 tokens. The cost for invoking the model (to extract ingredients text) from 1000 images (in the Asia Pacific Sydney region) would then be $0.14. This is obtained from the following calculation:

1000 images x ( 1700 input tokens x PricePerInputToken + 130 output token x PricePerOutputToken)

The Amazon Bedrock pricing page is available here.

For more details on using Amazon Bedrock and Amazon Nova models, continue reading.

Analysing pregnancy food safety of ingredients with Amazon Bedrock

The core feature of the web app is to analyse the safety of ingredients of a food product for pregnancy. This problem requires generative AI to solve. Amazon Bedrock is once again used to tackle this problem. For my use case of analysing the safety of the extracted ingredients text for pregnancy, I opted to use Amazon Nova Micro text-only model because it is very low cost and has low latency.

We have covered using Amazon Bedrock and the Amazon Nova Lite model for ingredient text extrication previously (here). Now I would like dive a bit deeper in how the Amazon Bedrock and the Amazon Nova model family is used.

To invoke a model with Amazon Bedrock, we simply need to call the InvokeModel API which only requires three input request parameters. An example usage implemented with the Python AWS SDK boto3 is below:

import boto3
import json

client = boto3.client("bedrock-runtime")

response = client.invoke_model(
    modelId="amazon.nova-micro-v1:0",
    contentType="application/json",
    body=json.dumps(native_request),
)

print(json.loads(response["body"].read()))

The modelId is where we put the ID of the used model, Amazon Nova Micro.

The contentType must be set to “application/json”.

The body is where where we provide the request to invoke our model as a JSON string. The body request schema is dependent of the model that is used.

An example request (native_request) using the Amazon Nova model for our food safety use case is below. You can find the Amazon nova complete request schema here.

ingredients = "Green peas, Glutinous Rice Flour, Corn Starch, Sugar, Salt"
system_prompt = (
    "You are a medical assistant. Your task is to evaluate food ingredients for safety during"
    " pregnancy. Avoid conversational language. "
    "Your response should group the ingredients in to one to three lists by their degree of "
    "safety risks. "
    "For ingredients that are well known for their risk during pregnancy, provide a medical "
    "reason with moderate detail. "
    "For ingredients that are well known to be low risk during pregnancy, provide a brief and "
    "consise medical reason. "
    "For ingredients that are not well known for their risk during pregnancy, provide a brief "
    "and consise medical reason. "
    "Provide the list with the highest risk ingredients first. "
    "Conclude by stating the overall safety of the ingredients for pregnancy. "
    "Provide a disclaimer that this AI generated response is not a substitute for medical "
    "advice."
)
prompt = (
    "Evaluate the following list of food ingredients for pregnancy safety."
    "\n\n"
    "Ingredients:\n"
    f"{ingredients}\n"
    "\n"
)
native_request = {
    "system": [{"text": system_prompt}],
    "messages": [{"role": "user", "content": [{"text": prompt}]}],
    "inferenceConfig": {"temperature": 0.7, "topP": 0.9},
}

You can see the request contains the prompt inputs:

  • system_prompt - Used to provide the role context and the response style and formatting guidance. This is also used to instruct providing the disclaimer.

  • prompt - The user prompt to provide the specific task.

These prompts were formed over several iterations of trial and error. With each iteration I used a fixed set of ingredients and refined the prompt to reach a desired output. To help with prompt engineering for Amazon Nova you can refer to creating precise prompts documentation here.

The request also contains the inference parameters temperature and topP which controls the amount of randomness and the choice of next tokens in the response. For more information about the available inference parameters, see the Amazon nova complete request schema here.

Using the Amazon Nova Micro model as-is with no additional fine tuning, I found that generally the food safety analysis results tended to list ingredients in the “high risk” category. I found this result doubtful, however I’m not pregnancy medical expert to verify. Because these responses are questionable and unverified, I shall restate again:

Important disclaimer: The solution for a “pregnancy food safety” web app described in this article and the given code examples should not to be used to provide valid medical advice. The solution is presented for the purposes of learning about AWS AI services only.

To improve the responses for analysing the pregnancy food safety of ingredients using Amazon Bedrock, the following could be done:

  • Retrieval-Augmented Generation (RAG) would be best used to look up pregnancy food safety data from trusted medical sources rather than using the models training knowledge.

  • Fine-tune the model with curated and verified pregnancy food safety data.

1
Subscribe to my newsletter

Read articles from Freddy Ho directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Freddy Ho
Freddy Ho