AWS Cost Optimization: Automating EBS Snapshot Management with Lambda
Managing cloud costs effectively is crucial for organizations of all sizes. In this comprehensive guide, we'll explore how to implement a practical AWS cost optimization solution focusing on EBS snapshot management using Lambda functions.
Why Cloud Cost Optimization Matters
Organizations move to the cloud for two primary reasons:
Reducing infrastructure overhead
Optimizing costs
However, simply moving to the cloud doesn't automatically guarantee cost savings. Without proper management, cloud costs can escalate due to:
Forgotten or stale resources
Unused volumes and snapshots
Inefficient resource allocation
Lack of automated cleanup processes
The Problem: Stale EBS Snapshots
One common scenario that leads to unnecessary costs is the accumulation of stale EBS snapshots. Here's how it typically happens:
Developers create EC2 instances with attached EBS volumes
They take regular snapshots of these volumes for backup
Later, they delete the EC2 instances and volumes
But they forget to delete the associated snapshots
AWS continues charging for these orphaned snapshots
Solution: Automated Snapshot Management
We'll create a Lambda function that automatically identifies and removes stale EBS snapshots. The function will:
List all EBS snapshots
Check if they're associated with existing volumes
Verify if those volumes are attached to running EC2 instances
Delete snapshots that are no longer needed
Architecture Overview
CloudWatch Event (Trigger)
↓
Lambda Function (Python)
↓
AWS APIs (via boto3)
↓
EBS Snapshots Management
Implementation
Step 1: Create the Lambda Function
First, create a new Lambda function:
Go to AWS Lambda console
Click "Create function"
Select "Author from scratch"
Name:
cost-optimization-ebs-snapshot
Runtime: Python 3.x
Architecture: x86_64
Step 2: Lambda Function Code
import boto3
def lambda_handler(event, context):
# Initialize EC2 client
ec2 = boto3.client('ec2')
# Get all running EC2 instances
active_instance_ids = set()
instances_response = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
# Extract instance IDs
for reservation in instances_response['Reservations']:
for instance in reservation['Instances']:
active_instance_ids.add(instance['InstanceId'])
# Get all EBS snapshots
snapshots_response = ec2.describe_snapshots(OwnerIds=['self'])
for snapshot in snapshots_response['Snapshots']:
snapshot_id = snapshot['SnapshotId']
volume_id = snapshot.get('VolumeId')
try:
# Check if the snapshot's volume exists
if volume_id:
try:
volume_response = ec2.describe_volumes(VolumeIds=[volume_id])
# Check if volume is attached to any running instance
for volume in volume_response['Volumes']:
is_attached = False
for attachment in volume['Attachments']:
if attachment['InstanceId'] in active_instance_ids:
is_attached = True
break
if not is_attached:
# Delete snapshot if volume exists but not attached
ec2.delete_snapshot(SnapshotId=snapshot_id)
print(f"Deleted EBS snapshot {snapshot_id} as its volume is not attached to any running instance")
except ec2.exceptions.ClientError as e:
if 'InvalidVolume.NotFound' in str(e):
# Delete snapshot if volume doesn't exist
ec2.delete_snapshot(SnapshotId=snapshot_id)
print(f"Deleted EBS snapshot {snapshot_id} as its associated volume was not found")
except Exception as e:
print(f"Error processing snapshot {snapshot_id}: {str(e)}")
Step 3: IAM Permissions
The Lambda function needs proper permissions to interact with EC2 resources. Create an IAM policy with these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeSnapshots",
"ec2:DeleteSnapshot",
"ec2:DescribeVolumes",
"ec2:DescribeInstances"
],
"Resource": "*"
}
]
}
Step 4: Lambda Configuration
Increase the default timeout:
Default is 3 seconds
Set to 10 seconds or more depending on your environment
Navigate to Configuration → General configuration → Edit
Attach the IAM policy to the Lambda execution role
Step 5: CloudWatch Event Trigger (Optional)
To automate the execution:
Go to CloudWatch → Rules
Create a new rule
Set up a schedule (e.g., daily or weekly)
Add the Lambda function as the target
Best Practices
Testing: Always test in a non-production environment first
Logging: Implement comprehensive logging for tracking deletions
Notifications: Consider adding SNS notifications for deleted snapshots
Age Check: Add conditions to check snapshot age before deletion
Backup Strategy: Ensure this doesn't conflict with backup policies
Advanced Considerations
You can enhance this solution by:
Adding age-based filtering (e.g., only delete snapshots older than 30 days)
Implementing tag-based exclusions
Adding cost reporting functionality
Extending to other resources (e.g., AMIs, volumes)
Adding pre-deletion validation checks
Conclusion
This automated solution helps maintain a clean AWS environment and reduce costs by removing unnecessary EBS snapshots. While this example focuses on snapshots, the same principles can be applied to other AWS resources like unattached EBS volumes, unused EIPs, or obsolete AMIs.
Remember to regularly review and adjust the cleanup criteria based on your organization's needs and backup requirements.
Next Steps
Implement this solution in your AWS environment
Monitor the cost savings
Extend the solution to other resource types
Set up alerting for deleted resources
Document the process for your team
Happy cost optimizing! 🚀
Subscribe to my newsletter
Read articles from Amulya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by