content-type validation during file uploads to an AWS S3 bucket

Uploading files to an AWS S3 bucket is a common requirement in modern applications, but it comes with a crucial responsibility—validating the content type of the uploaded files. Without proper validation, malicious or unwanted file types could slip through, potentially leading to security vulnerabilities or system issues.

In this blog, we’ll explore how to enforce content-type validation during file uploads to S3. We'll cover key techniques to block or notify when unknown content types are uploaded.


Why Content-Type Validation Matters

Content-Type validation ensures that files being uploaded match your application's expectations and security policies. Common use cases include:

  1. Preventing malicious uploads: For example, blocking executables or scripts.

  2. Maintaining application integrity: Allowing only image files for profile pictures or text files for logs.

  3. Improved error handling: Early detection of incorrect file types enhances user experience.


How Content-Type Validation Works in AWS S3

AWS S3 supports custom logic for file uploads via AWS Lambda, Pre-signed URLs, and S3 Event Notifications. Below are strategies for implementing validation:

A detailed workflow diagram illustrating the process of content-type validation during file uploads to an AWS S3 bucket. The diagram includes the following steps: 1) User uploads a file to the S3 bucket. 2) An S3 Event Notification triggers an AWS Lambda function upon file upload. 3) The Lambda function retrieves file metadata, including the Content-Type. 4) The function compares the Content-Type against a whitelist of allowed types. 5) If the Content-Type is invalid, the file is deleted, and an alert is sent via Amazon SNS. 6) If valid, the upload is accepted. Arrows and labels clearly indicate the flow of events between components: User, S3 Bucket, Lambda Function, SNS, and Whitelist Logic. The diagram uses cloud-themed icons for AWS services and a clean, professional design.


1. Validating Content-Type Using AWS Lambda

You can create an S3 event trigger for object creation. When a file is uploaded, an AWS Lambda function is triggered to inspect the file's metadata, including the Content-Type. If the content type is invalid, the function can take corrective actions, such as:

  • Deleting the file: Automatically remove invalid files.

  • Sending a notification: Use Amazon SNS or email to alert the user or admin.

Sample Code for Validation:

pythonCopy codeimport boto3

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    object_key = event['Records'][0]['s3']['object']['key']

    # Get object metadata
    response = s3.head_object(Bucket=bucket_name, Key=object_key)
    content_type = response['ContentType']

    # Define allowed content types
    allowed_types = ['image/jpeg', 'image/png', 'application/pdf']

    if content_type not in allowed_types:
        # Delete the file
        s3.delete_object(Bucket=bucket_name, Key=object_key)

        # Notify via SNS or CloudWatch Logs
        print(f"Blocked file with invalid content type: {content_type}")
        return {
            'statusCode': 400,
            'body': 'Invalid content type. File removed.'
        }

    return {
        'statusCode': 200,
        'body': 'File uploaded successfully.'
    }

2. Validating Content-Type with Pre-Signed URLs

Pre-signed URLs provide a secure way to restrict uploads to specific file types. When generating the URL, you can include conditions to enforce content-type restrictions.

Example:

pythonCopy codeimport boto3

s3 = boto3.client('s3')
bucket_name = 'your-bucket-name'
object_key = 'uploads/test.jpg'

# Generate a pre-signed URL with content-type restrictions
response = s3.generate_presigned_post(
    Bucket=bucket_name,
    Key=object_key,
    Conditions=[
        {"Content-Type": "image/jpeg"}
    ],
    ExpiresIn=3600
)

print(response)

With this approach, only files with the specified content type (e.g., image/jpeg) can be uploaded.


3. Combining S3 Event Notifications and Content-Type Validation

Using S3 Event Notifications, you can trigger workflows whenever a new file is uploaded. For example:

  1. Set up an S3 bucket event to send notifications to an SNS topic.

  2. Configure an AWS Lambda function to process these notifications and validate content types.

  3. Notify users of invalid uploads or perform remediation actions.


Test case/POC.

I’ve used one S3 Bucket and configued Lambda fuction so that if anyone uploads non TXT file, it will send sns notification like below.

I’ve uploaded one dr.pcap file, after upload I got the notification in the sns email.

Best Practices for Content-Type Validation

  1. Whitelist allowed types: Always use a whitelist to explicitly define allowed file types.

  2. Verify actual file content: Content-Type headers can be manipulated. Use libraries like python-magic to inspect the file's actual MIME type.

  3. Log invalid attempts: Keep track of invalid upload attempts for auditing and troubleshooting.

  4. User feedback: Provide clear error messages when uploads fail validation.


Validating MIME Types Using python-magic

Why It’s Important:
Attackers can upload malicious files disguised as valid file types by tampering with the Content-Type header. To mitigate this, inspect the file's actual content to determine its MIME type. Libraries like python-magic use file signatures (magic numbers) to identify the true MIME type, which is far more reliable than trusting user-provided headers.


Implementation Example

Here’s how you can incorporate python-magic into an AWS Lambda function for MIME type validation:

Steps:

  1. Install the python-magic library:

    • If you're developing locally, add it to your requirements file:

        plaintextCopyEditpython-magic-bin==0.4.14
      
    • Ensure the library is included in your Lambda deployment package.

  2. Use the library to inspect the file content during the upload process.

Lambda Function Code:

pythonCopyEditimport boto3
import magic

# S3 client
s3_client = boto3.client('s3')

# Allowed MIME types
ALLOWED_MIME_TYPES = ['image/png', 'image/jpeg', 'application/pdf']

def lambda_handler(event, context):
    # Extract bucket name and file key from the event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    file_key = event['Records'][0]['s3']['object']['key']

    # Download the file to a temporary directory
    local_file_path = f"/tmp/{file_key.split('/')[-1]}"
    s3_client.download_file(bucket_name, file_key, local_file_path)

    # Inspect the file's MIME type using python-magic
    mime = magic.Magic(mime=True)
    detected_mime_type = mime.from_file(local_file_path)
    print(f"Detected MIME type: {detected_mime_type}")

    # Validate MIME type
    if detected_mime_type not in ALLOWED_MIME_TYPES:
        # If invalid, delete the file and log the event
        s3_client.delete_object(Bucket=bucket_name, Key=file_key)
        print(f"Rejected file: {file_key} (MIME type: {detected_mime_type})")
        # Optionally, notify admin via SNS or SES
    else:
        print(f"Accepted file: {file_key} (MIME type: {detected_mime_type})")

Advantages of Using python-magic

  1. Reliable Validation:
    Validates the actual file content instead of relying on user-provided metadata, significantly reducing the risk of malicious file uploads.

  2. Easy Integration:
    Works seamlessly in Python applications, including AWS Lambda functions.

  3. Broad MIME Type Support:
    Detects a wide range of file types using the libmagic library under the hood.


Best Practices for File Type Validation

  1. Whitelist Allowed MIME Types:
    Maintain a list of acceptable MIME types (e.g., image/png, application/pdf) and reject anything outside this list.

  2. Combine MIME Validation with File Scanning:
    After MIME type validation, scan the file for malware using tools like Trend Micro AMAAS, ClamAV, or Amazon Macie.

  3. Monitor and Log Validation Results:
    Log all file validation results for auditing and troubleshooting. Use AWS services like CloudWatch for monitoring.

  4. Add Size Limits:
    Prevent excessively large files by setting size limits during the S3 upload process.


Enhanced Workflow

  1. User Uploads File → Upload triggers an S3 Event Notification.

  2. Lambda Function Executes:

    • Validates MIME type using python-magic.

    • Scans for malware using a security tool.

    • Rejects or accepts the file based on results.

  3. Log Events → Store validation and scanning results in CloudWatch for auditing.

Conclusion

Content-type validation is a critical step in securing and maintaining your S3 file uploads. Whether through Lambda functions, pre-signed URLs, or event notifications, AWS provides robust tools to implement this feature effectively. By enforcing these practices, you can ensure that your application remains secure, compliant, and user-friendly.

0
Subscribe to my newsletter

Read articles from Sourav Chakraborty directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sourav Chakraborty
Sourav Chakraborty