AWS- Workmail — Extracting Email Attachment and storing in S3 bucket

Pradeep ABPradeep AB
6 min read

In today's digital landscape, exchanging information via email remains a common practice despite the rise of advanced methods like SFTP endpoints, APIs, and message queues. Many organizations still rely on email attachments to transfer critical data, making it essential to manage and automate the processing of these files efficiently.

In this article, I’ll walk you through the process of automatically extracting email attachments and storing them in an Amazon S3 bucket using key AWS services. This approach is particularly useful for organizations that need to automate file handling without manual intervention.

Here are the AWS services we'll leverage:

  1. Amazon WorkMail: A secure, managed email and calendar service that integrates seamlessly with existing desktop and mobile email clients, ensuring a familiar experience for users.

  2. Amazon SES (Simple Email Service): This service handles the sending of all outgoing email for Amazon WorkMail, providing a reliable and scalable platform for managing email communications.

  3. Amazon S3 (Simple Storage Service): A versatile storage solution that allows you to store and retrieve any amount of data, anytime, from anywhere. We'll use it to securely store the extracted email attachments.

  4. AWS Lambda: A serverless compute service that allows you to run code in response to events. In this setup, Lambda will extract the attachments from incoming emails and save them directly to an S3 bucket.

Visual representation of the entire process with AWS Services:

Let us divide this process implementation in two Steps:

I. Receiving the Raw email in S3 bucket.

II. Processing the Raw email and Extracting the Attachment.

Step I. Receiving the Raw email in S3 bucket.

To trace any incoming emails in recipient’s account, we first need to configure a dedicated email server and a dedicated email address to receive incoming emails using Amazon WorkMail. Post this, we need to configure a rule in Amazon SES. As Workmail falls into the AWS ecosystem, it becomes fundamentally easy to integrate with other Amazon services such as S3.

  • Configuring a dedicated email server:

We are Creating a email server in Route-53 in Our Case:

  • In the Route 53 Dashboard, click on Hosted Zones on the left-hand panel.

  • Click the Create Hosted Zone button.

  • Enter your Domain Name (e.g., my-org-domain.com).

  • Choose the type of zone: Public Hosted Zone (for public-facing domains).

  • Optionally, add a comment.

  • Click Create Hosted Zone.

Create and Organization and the User in Work-mail:

  1. Create an organization in Workmail.

  2. First, create a new organization in Amazon WorkMail. During the setup, you'll need to select a domain for your email server. You can choose a domain you already own, whether it's a Route 53-managed domain (as in our case) or an externally hosted domain. You'll also specify an alias for the organization. For this example, I've used a free test domain and set the alias as "my-org-domain.com."

Supported Regions for Amazon WorkMail:

  1. Europe (Ireland) - eu-west-1

  2. US East (N. Virginia) - us-east-1

  3. US West (Oregon) - us-west-2

Note:

  • You must choose one of these regions when creating your WorkMail organization. If your other AWS services (such as SES, S3, Lambda) are running in a different region, you may need to account for cross-region data handling.

  • For optimal performance and reduced latency, it's recommended to set up WorkMail in the same region as your other AWS services whenever possible.

Add a New User in Amazon WorkMail:

Steps:

  1. Access the WorkMail Console and choose your organization (e.g., my-org-domain).

  2. Go to Users and click Add User.

  3. Enter the email (e.g., mailautomation@my-org-domain.awsapps.com), display name, and optionally the first/last name.

  4. Set a password for the user.

  5. Review and confirm the details, then click Create.

  6. Share the login details with the user for access at https://my-org-domain.awsapps.com/mail.

  • Configure rule in Amazon SES:

The test domain ‘my-org-domain.awsapps.com’ is automatically verified under Amazon SES — Verified Identities.

An active rule set called ‘INBOUND_MAIL’ is automatically created under Amazon SES →Email receiving->Receipt Rule Sets.

Steps to Create a New Rule in the Active Rule Set (INBOUND_MAIL):

  1. Click Create Rule and give it a name.

  2. Set the recipient to mailautomation@my-org-domain.awsapps.com.

  3. Add an action for 'S3', selecting the destination S3 bucket where the emails will be stored.

  4. Review the settings and click Create Rule.

Created a rule named 'Email-Attachment-Extraction', with the recipient 'mailautomation@my-org-domain.awsapps.com'. Added an action to deliver emails to the 'raw-email-ingestion' S3 bucket.

Step II. Processing the Raw email and Extracting it’s Attachment.

The raw email file is not in a readable format. Hence, we have to process that file by writing a python script in AWS Lambda and then extract it’s attachments. Then to invoke this lambda on receiving of any new file, a S3 event notification is configured on the source S3 bucket — ‘raw-email-ingestion’.

  • AWS Lambda:

Create a new function in AWS Lambda, provide a function name — ‘extract-attachment-lambda’, runtime — ‘Python 3.8’ and your default execution role that should have permission of lambda and S3.

The below code works for:

  1. Png, jpg, jpeg, pdf type of file formats. You can further add your customizations to the code for various other potential file formats that might receive in your email attachments.

  2. Once the file is processed, raw email file is deleted from the ‘raw-email-ingestion’ bucket and now the extracted attachment is stored in destination bucket — ‘extracted-data’.

import boto3
import email
import os
from os import path
import urllib.parse
import json

s3 = boto3.client('s3', REGION)
s3_resource = boto3.resource('s3', REGION)

def lambda_handler(event, context):

    bucketName = urllib.parse.unquote_plus(event['Records'][0]['s3']['bucket']['name'])
    object_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])

    # Get s3 object contents based on bucket name and object key; in bytes and convert to string
    data = s3.get_object(Bucket=bucketName, Key=object_key)
    destination_bucket_name = 'extracted-data'
    destination_path = ''
    project_name = object_key.split('/')[0]
    contents = data['Body'].read().decode("utf-8")

    # Given the s3 object content is the ses email, get the message content and attachment using email package
    msg = email.message_from_string(contents)
    fromAddress = msg['from']
    content_data = msg.get_payload()
    attachments = []

    bucket_resource = s3_resource.Bucket(bucketName)

    try:
        if content_data != []:
            for index, attachment_value in enumerate(content_data):
                attachments.append(attachment_value)

            for index, attachment in enumerate(attachments):
                if not attachment or type(attachment) == None:
                    continue
                contentType = attachment.get_content_type()
                file_name = attachment.get_filename()
                print("type of attachment >> ",type(attachment), contentType, file_name)

                #  check if file_name is not empty then it means either pdf, zip or images are directly sent from email
                if file_name != "" and file_name != None and file_name and len(file_name.split(".")) > 0:
                    fileKey = file_name
                    try:
                        file_ext = fileKey.split(".")[-1]
                        local_path = '/tmp/{}'.format(fileKey)

                        if file_ext == 'pdf' or file_ext == 'png' or file_ext == 'jpeg' or file_ext == 'jpg':
                            final_s3_path = destination_path + '/' + fileKey
                            open(local_path, 'wb').write(attachment.get_payload(decode=True))
                            s3.upload_file(local_path, destination_bucket_name, final_s3_path)
                            os.remove(local_path)
                        else:
                            print("file is of other type so cannot be processed!")

                    except Exception as e:
                        print("exception >> ",str(e))
        else:
            print("no attachment found ")

        if object_key != "":
            bucket_resource.objects.filter(Prefix="{0}".format(object_key)).delete()
        return {
            'statusCode': 200,
            'body': json.dumps('SES Email received and processed!')
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps('Error in SES Email processed!')
        }
  • Configure S3 event notification:

Create an S3 event notification named — ‘raw-email-trigger’ on your source S3 bucket — ‘raw-email-ingestion’. Add the event types Put and Post and configure it to invoke the lambda that you have created — ‘extract-attachment-lambda’.

S3 event notification

Note*: We have now completed both of the steps.*

To test the entire process:

  1. Send an Email to ‘mailautomation@my-org-domain.awsapps.com’, with an image attachment.

  2. Check ‘raw-email-ingestion’’ bucket, a raw Email file will be present.

  3. Check the logs of ‘extract-attachment-lambda’ lambda, it should have been triggered.

  4. Check ‘extracted-data’ bucket, attachment file is found.

0
Subscribe to my newsletter

Read articles from Pradeep AB directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pradeep AB
Pradeep AB

Passionate Cloud Engineer | AWS Certified Solutions Architect | Multi-Cloud Expertise in AWS, Azure, GCP & Oracle | DevSecOps Enthusiast | Proficient in Linux, Docker, Kubernetes, Terraform, ArgoCD & Jenkins | Building Scalable & Secure CI/CD Pipelines | Automating the Future with Python & Github