Automated Receipt 🧾Organiser - AWS☁️ and AI🤖

SaviSavi
7 min read

About the Project

the project helps the user to save receipts within their email using AWS tools; this helps the user to save time and organise the receipts so they dont need to keep looking for physical copies when needed. 💡

Services Used

  1. Amazon S3: Stores uploaded receipt images and PDFs. [Storage]

  2. Amazon Textract: Extracts text and structured data from scanned receipts. [AI/ML]

  3. Amazon DynamoDB: Stores extracted receipt data in a structured format. [Database]

  4. Amazon SES: Sends email notifications with extracted receipt details. [Messaging]

  5. AWS Lambda: Automates the processing workflow for real-time execution. [Compute]

  6. IAM Roles & Policies: Ensures secure access between services. [Security]

Architectural Diagram 📝

Automated AWS Receipt Processing System


The architecture consists of 5 layers :

  1. Storage Layer: Amazon S3 stores receipt images and PDFs.

  2. Processing Layer: Amazon Textract extracts text from receipts using AI-powered OCR.

  3. Database Layer: DynamoDB stores the extracted data in a structured format.

  4. Notification System: Amazon SES sends email alerts with receipt details.

  5. Compute Layer: AWS Lambda automates the workflow by processing the receipts in real-time.

Time ⌛ and Cost 💸

Approximate 2 hours using AWS Free Tier services


Step-by-Step Guide

  1. Sign in to AWS Console

    • Navigate to Amazon S3 and click on Create Bucket.

    • Ensure the bucket name is unique, but keep other settings as default.

    • Click on the newly created S3 bucket and create a folder named "incoming" for uploading receipts.

  2. Set Up DynamoDB

    • Navigate to DynamoDB and click on Create Table.

    • Enter "receipts" as the Table Name.

    • Set "receipt_id" as the partition key with type "string".

    • Set "date" as the sort key with type "string" to sort receipts by date.

    • Keep other settings default and click Create Table.

  3. Configure Amazon SES

    • Navigate to Amazon SES and go to Configuration > Identities.

    • Click on Create Identity and choose the Email Address option.

    • Enter the email address to be used for sending receipts and click Create Identity.

    • Verify your email address through the verification email sent by AWS.

  4. Create IAM Role

    • Navigate to IAM and click on Roles.

    • Create a role using AWS Service as the trusted entity type and Lambda as the use case.

    • On the permissions policies page, select the following policies:

      1. AmazonS3ReadOnlyAccess

      2. AmazonTextractFullAccess

      3. AmazonDynamoDBFullAccess

      4. AmazonSESFullAccess

      5. AWSLambdaBasicExecutionRole

    • Name the role "ReceiptProcessingLambdaRole" and click Create Role.

  5. Create Lambda Function

    • Click on Create a Function and select Author from scratch.

    • Name the function "Receipt" (or any preferred name).

    • Choose Python 3.9 as the runtime.

    • Select the existing role "ReceiptProcessingLambdaRole".

    • Change the timeout to 3 minutes in the configuration settings.

    • Add the following environment variables:

KeyValue
DYNAMODB_TABLEReceipts
SES_SENDER_EMAIL“email that you used for IAM that you verified”
SES_RECIPIENT_EMAIL“same email as sender”
  1. Add Lambda Code
  • Replace the default code with the provided code for processing receipts.

Code👩‍💻


import json
import os
import boto3
import uuid
from datetime import datetime
import urllib.parse

# Initialize AWS clients
s3 = boto3.client('s3')
textract = boto3.client('textract')
dynamodb = boto3.resource('dynamodb')
ses = boto3.client('ses')

# Environment variables
DYNAMODB_TABLE = os.environ.get('DYNAMODB_TABLE', 'Receipts')
SES_SENDER_EMAIL = os.environ.get('SES_SENDER_EMAIL', 'your-email@example.com')
SES_RECIPIENT_EMAIL = os.environ.get('SES_RECIPIENT_EMAIL', 'recipient@example.com')

def lambda_handler(event, context):
    try:
        # Get the S3 bucket and key from the event
        bucket = event['Records'][0]['s3']['bucket']['name']
        # URL decode the key to handle spaces and special characters
        key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])

        print(f"Processing receipt from {bucket}/{key}")

        # Verify the object exists before proceeding
        try:
            s3.head_object(Bucket=bucket, Key=key)
            print(f"Object verification successful: {bucket}/{key}")
        except Exception as e:
            print(f"Object verification failed: {str(e)}")
            raise Exception(f"Unable to access object {key} in bucket {bucket}: {str(e)}")

        # Step 1: Process receipt with Textract
        receipt_data = process_receipt_with_textract(bucket, key)

        # Step 2: Store results in DynamoDB
        store_receipt_in_dynamodb(receipt_data, bucket, key)

        # Step 3: Send email notification
        send_email_notification(receipt_data)

        return {
            'statusCode': 200,
            'body': json.dumps('Receipt processed successfully!')
        }
    except Exception as e:
        print(f"Error processing receipt: {str(e)}")
        return {
            'statusCode': 500,
            'body': json.dumps(f'Error: {str(e)}')
        }

def process_receipt_with_textract(bucket, key):
    """Process receipt using Textract's AnalyzeExpense operation"""
    try:
        print(f"Calling Textract analyze_expense for {bucket}/{key}")
        response = textract.analyze_expense(
            Document={
                'S3Object': {
                    'Bucket': bucket,
                    'Name': key
                }
            }
        )
        print("Textract analyze_expense call successful")
    except Exception as e:
        print(f"Textract analyze_expense call failed: {str(e)}")
        raise

    # Generate a unique ID for this receipt
    receipt_id = str(uuid.uuid4())

    # Initialize receipt data dictionary
    receipt_data = {
        'receipt_id': receipt_id,
        'date': datetime.now().strftime('%Y-%m-%d'),  # Default date
        'vendor': 'Unknown',
        'total': '0.00',
        'items': [],
        's3_path': f"s3://{bucket}/{key}"
    }

    # Extract data from Textract response
    if 'ExpenseDocuments' in response and response['ExpenseDocuments']:
        expense_doc = response['ExpenseDocuments'][0]

        # Process summary fields (TOTAL, DATE, VENDOR)
        if 'SummaryFields' in expense_doc:
            for field in expense_doc['SummaryFields']:
                field_type = field.get('Type', {}).get('Text', '')
                value = field.get('ValueDetection', {}).get('Text', '')

                if field_type == 'TOTAL':
                    receipt_data['total'] = value
                elif field_type == 'INVOICE_RECEIPT_DATE':
                    # Try to parse and format the date
                    try:
                        receipt_data['date'] = value
                    except:
                        # Keep the default date if parsing fails
                        pass
                elif field_type == 'VENDOR_NAME':
                    receipt_data['vendor'] = value

        # Process line items
        if 'LineItemGroups' in expense_doc:
            for group in expense_doc['LineItemGroups']:
                if 'LineItems' in group:
                    for line_item in group['LineItems']:
                        item = {}
                        for field in line_item.get('LineItemExpenseFields', []):
                            field_type = field.get('Type', {}).get('Text', '')
                            value = field.get('ValueDetection', {}).get('Text', '')

                            if field_type == 'ITEM':
                                item['name'] = value
                            elif field_type == 'PRICE':
                                item['price'] = value
                            elif field_type == 'QUANTITY':
                                item['quantity'] = value

                        # Add to items list if we have a name
                        if 'name' in item:
                            receipt_data['items'].append(item)

    print(f"Extracted receipt data: {json.dumps(receipt_data)}")
    return receipt_data

def store_receipt_in_dynamodb(receipt_data, bucket, key):
    """Store the extracted receipt data in DynamoDB"""
    try:
        table = dynamodb.Table(DYNAMODB_TABLE)

        # Convert items to a format DynamoDB can store
        items_for_db = []
        for item in receipt_data['items']:
            items_for_db.append({
                'name': item.get('name', 'Unknown Item'),
                'price': item.get('price', '0.00'),
                'quantity': item.get('quantity', '1')
            })

        # Create item to insert
        db_item = {
            'receipt_id': receipt_data['receipt_id'],
            'date': receipt_data['date'],
            'vendor': receipt_data['vendor'],
            'total': receipt_data['total'],
            'items': items_for_db,
            's3_path': receipt_data['s3_path'],
            'processed_timestamp': datetime.now().isoformat()
        }

        # Insert into DynamoDB
        table.put_item(Item=db_item)
        print(f"Receipt data stored in DynamoDB: {receipt_data['receipt_id']}")
    except Exception as e:
        print(f"Error storing data in DynamoDB: {str(e)}")
        raise

def send_email_notification(receipt_data):
    """Send an email notification with receipt details"""
    try:
        # Format items for email
        items_html = ""
        for item in receipt_data['items']:
            name = item.get('name', 'Unknown Item')
            price = item.get('price', 'N/A')
            quantity = item.get('quantity', '1')
            items_html += f"<li>{name} - ${price} x {quantity}</li>"

        if not items_html:
            items_html = "<li>No items detected</li>"

        # Create email body
        html_body = f"""
        <html>
        <body>
            <h2>Receipt Processing Notification</h2>
            <p><strong>Receipt ID:</strong> {receipt_data['receipt_id']}</p>
            <p><strong>Vendor:</strong> {receipt_data['vendor']}</p>
            <p><strong>Date:</strong> {receipt_data['date']}</p>
            <p><strong>Total Amount:</strong> ${receipt_data['total']}</p>
            <p><strong>S3 Location:</strong> {receipt_data['s3_path']}</p>

            <h3>Items:</h3>
            <ul>
                {items_html}
            </ul>

            <p>The receipt has been processed and stored in DynamoDB.</p>
        </body>
        </html>
        """

        # Send email using SES
        ses.send_email(
            Source=SES_SENDER_EMAIL,
            Destination={
                'ToAddresses': [SES_RECIPIENT_EMAIL]
            },
            Message={
                'Subject': {
                    'Data': f"Receipt Processed: {receipt_data['vendor']} - ${receipt_data['total']}"
                },
                'Body': {
                    'Html': {
                        'Data': html_body
                    }
                }
            }
        )

        print(f"Email notification sent to {SES_RECIPIENT_EMAIL}")
    except Exception as e:
        print(f"Error sending email notification: {str(e)}")
        # Continue execution even if email fails
        print("Continuing execution despite email error")

  1. Set Up S3 Event Notification
  • Go back to the S3 bucket and navigate to the Properties tab.

  • Scroll to Event Notifications and click Create Event Notification.

  • Enter "ReceiptUpload" as the name and "incoming/" as the prefix.

  • Select All object create events as the event type.

  • Choose Lambda function as the destination and select ReceiptProcessor from the dropdown.

  • Save the changes.

  1. Testing
  • Upload a receipt to the incoming folder in the S3 bucket. Here is an example you can use ⬇️

  • Receipt examples

    Check your email for a notification (it might be in the spam folder).

By following these steps, you will have a fully functional automated receipt processing system using AWS services.

the email should look something like this ⬇️

you can also visit the lambda function and click on monitor to see if the function works

the dots in the graph below represents the activity

Cleanup 🗑️

  1. Delete S3 Bucket: Remove all uploaded receipt files and then delete the bucket.

2. Stop Textract Processing: Ensure no further API calls are made to prevent extra costs.

3. Delete DynamoDB Table: Remove stored receipt data and then delete the table.

4. Disable SES Notifications: If SES was configured, remove verified email addresses.

5. Remove IAM Roles and Policies: Delete the IAM role created for the Lambda function.

✨Inspiration✨

This Article is inspired by Tech with Lucy's Build With Me Videos from Youtube

Feel free to leave me a comment if you face any issues during the project. 🥰

1
Subscribe to my newsletter

Read articles from Savi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Savi
Savi

Hey there! I’m Savi, a cloud enthusiast, tech builder, and all-around problem-solver based in London. 💻☁️ By day, I’m diving into AWS, Azure, and everything cloud-related, turning ideas into scalable, secure, and impactful projects. By night, I’m juggling French lessons (B2 goals 💬), hitting the gym, and whipping up kitchen experiments. 🏋️‍♀️🍳 I believe in learning by doing, so you’ll find this blog packed with cool projects, tech hacks, and lessons from my journey in the world of cloud computing. Stick around—let’s build, learn, and grow together. 🚀