content-type validation during file uploads to an AWS S3 bucket

Uploading files to an AWS S3 bucket is a common requirement in modern applications, but it comes with a crucial responsibility—validating the content type of the uploaded files. Without proper validation, malicious or unwanted file types could slip through, potentially leading to security vulnerabilities or system issues.
In this blog, we’ll explore how to enforce content-type validation during file uploads to S3. We'll cover key techniques to block or notify when unknown content types are uploaded.
Why Content-Type Validation Matters
Content-Type validation ensures that files being uploaded match your application's expectations and security policies. Common use cases include:
Preventing malicious uploads: For example, blocking executables or scripts.
Maintaining application integrity: Allowing only image files for profile pictures or text files for logs.
Improved error handling: Early detection of incorrect file types enhances user experience.
How Content-Type Validation Works in AWS S3
AWS S3 supports custom logic for file uploads via AWS Lambda, Pre-signed URLs, and S3 Event Notifications. Below are strategies for implementing validation:
1. Validating Content-Type Using AWS Lambda
You can create an S3 event trigger for object creation. When a file is uploaded, an AWS Lambda function is triggered to inspect the file's metadata, including the Content-Type
. If the content type is invalid, the function can take corrective actions, such as:
Deleting the file: Automatically remove invalid files.
Sending a notification: Use Amazon SNS or email to alert the user or admin.
Sample Code for Validation:
pythonCopy codeimport boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
# Get object metadata
response = s3.head_object(Bucket=bucket_name, Key=object_key)
content_type = response['ContentType']
# Define allowed content types
allowed_types = ['image/jpeg', 'image/png', 'application/pdf']
if content_type not in allowed_types:
# Delete the file
s3.delete_object(Bucket=bucket_name, Key=object_key)
# Notify via SNS or CloudWatch Logs
print(f"Blocked file with invalid content type: {content_type}")
return {
'statusCode': 400,
'body': 'Invalid content type. File removed.'
}
return {
'statusCode': 200,
'body': 'File uploaded successfully.'
}
2. Validating Content-Type with Pre-Signed URLs
Pre-signed URLs provide a secure way to restrict uploads to specific file types. When generating the URL, you can include conditions to enforce content-type restrictions.
Example:
pythonCopy codeimport boto3
s3 = boto3.client('s3')
bucket_name = 'your-bucket-name'
object_key = 'uploads/test.jpg'
# Generate a pre-signed URL with content-type restrictions
response = s3.generate_presigned_post(
Bucket=bucket_name,
Key=object_key,
Conditions=[
{"Content-Type": "image/jpeg"}
],
ExpiresIn=3600
)
print(response)
With this approach, only files with the specified content type (e.g., image/jpeg
) can be uploaded.
3. Combining S3 Event Notifications and Content-Type Validation
Using S3 Event Notifications, you can trigger workflows whenever a new file is uploaded. For example:
Set up an S3 bucket event to send notifications to an SNS topic.
Configure an AWS Lambda function to process these notifications and validate content types.
Notify users of invalid uploads or perform remediation actions.
Test case/POC.
I’ve used one S3 Bucket and configued Lambda fuction so that if anyone uploads non TXT file, it will send sns notification like below.
I’ve uploaded one dr.pcap file, after upload I got the notification in the sns email.
Best Practices for Content-Type Validation
Whitelist allowed types: Always use a whitelist to explicitly define allowed file types.
Verify actual file content: Content-Type headers can be manipulated. Use libraries like
python-magic
to inspect the file's actual MIME type.Log invalid attempts: Keep track of invalid upload attempts for auditing and troubleshooting.
User feedback: Provide clear error messages when uploads fail validation.
Validating MIME Types Using python-magic
Why It’s Important:
Attackers can upload malicious files disguised as valid file types by tampering with the Content-Type
header. To mitigate this, inspect the file's actual content to determine its MIME type. Libraries like python-magic use file signatures (magic numbers) to identify the true MIME type, which is far more reliable than trusting user-provided headers.
Implementation Example
Here’s how you can incorporate python-magic into an AWS Lambda function for MIME type validation:
Steps:
Install the
python-magic
library:If you're developing locally, add it to your requirements file:
plaintextCopyEditpython-magic-bin==0.4.14
Ensure the library is included in your Lambda deployment package.
Use the library to inspect the file content during the upload process.
Lambda Function Code:
pythonCopyEditimport boto3
import magic
# S3 client
s3_client = boto3.client('s3')
# Allowed MIME types
ALLOWED_MIME_TYPES = ['image/png', 'image/jpeg', 'application/pdf']
def lambda_handler(event, context):
# Extract bucket name and file key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']
# Download the file to a temporary directory
local_file_path = f"/tmp/{file_key.split('/')[-1]}"
s3_client.download_file(bucket_name, file_key, local_file_path)
# Inspect the file's MIME type using python-magic
mime = magic.Magic(mime=True)
detected_mime_type = mime.from_file(local_file_path)
print(f"Detected MIME type: {detected_mime_type}")
# Validate MIME type
if detected_mime_type not in ALLOWED_MIME_TYPES:
# If invalid, delete the file and log the event
s3_client.delete_object(Bucket=bucket_name, Key=file_key)
print(f"Rejected file: {file_key} (MIME type: {detected_mime_type})")
# Optionally, notify admin via SNS or SES
else:
print(f"Accepted file: {file_key} (MIME type: {detected_mime_type})")
Advantages of Using python-magic
Reliable Validation:
Validates the actual file content instead of relying on user-provided metadata, significantly reducing the risk of malicious file uploads.Easy Integration:
Works seamlessly in Python applications, including AWS Lambda functions.Broad MIME Type Support:
Detects a wide range of file types using the libmagic library under the hood.
Best Practices for File Type Validation
Whitelist Allowed MIME Types:
Maintain a list of acceptable MIME types (e.g.,image/png
,application/pdf
) and reject anything outside this list.Combine MIME Validation with File Scanning:
After MIME type validation, scan the file for malware using tools like Trend Micro AMAAS, ClamAV, or Amazon Macie.Monitor and Log Validation Results:
Log all file validation results for auditing and troubleshooting. Use AWS services like CloudWatch for monitoring.Add Size Limits:
Prevent excessively large files by setting size limits during the S3 upload process.
Enhanced Workflow
User Uploads File → Upload triggers an S3 Event Notification.
Lambda Function Executes:
Validates MIME type using python-magic.
Scans for malware using a security tool.
Rejects or accepts the file based on results.
Log Events → Store validation and scanning results in CloudWatch for auditing.
Conclusion
Content-type validation is a critical step in securing and maintaining your S3 file uploads. Whether through Lambda functions, pre-signed URLs, or event notifications, AWS provides robust tools to implement this feature effectively. By enforcing these practices, you can ensure that your application remains secure, compliant, and user-friendly.
Subscribe to my newsletter
Read articles from Sourav Chakraborty directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
