AWS Cost Optimization: Automating EBS Snapshot Management with Lambda

AmulyaAmulya
4 min read

Managing cloud costs effectively is crucial for organizations of all sizes. In this comprehensive guide, we'll explore how to implement a practical AWS cost optimization solution focusing on EBS snapshot management using Lambda functions.

Why Cloud Cost Optimization Matters

Organizations move to the cloud for two primary reasons:

  1. Reducing infrastructure overhead

  2. Optimizing costs

However, simply moving to the cloud doesn't automatically guarantee cost savings. Without proper management, cloud costs can escalate due to:

  • Forgotten or stale resources

  • Unused volumes and snapshots

  • Inefficient resource allocation

  • Lack of automated cleanup processes

The Problem: Stale EBS Snapshots

One common scenario that leads to unnecessary costs is the accumulation of stale EBS snapshots. Here's how it typically happens:

  1. Developers create EC2 instances with attached EBS volumes

  2. They take regular snapshots of these volumes for backup

  3. Later, they delete the EC2 instances and volumes

  4. But they forget to delete the associated snapshots

  5. AWS continues charging for these orphaned snapshots

Solution: Automated Snapshot Management

We'll create a Lambda function that automatically identifies and removes stale EBS snapshots. The function will:

  • List all EBS snapshots

  • Check if they're associated with existing volumes

  • Verify if those volumes are attached to running EC2 instances

  • Delete snapshots that are no longer needed

Architecture Overview

CloudWatch Event (Trigger)
        ↓
Lambda Function (Python)
        ↓
AWS APIs (via boto3)
        ↓
EBS Snapshots Management

Implementation

Step 1: Create the Lambda Function

First, create a new Lambda function:

  1. Go to AWS Lambda console

  2. Click "Create function"

  3. Select "Author from scratch"

  4. Name: cost-optimization-ebs-snapshot

  5. Runtime: Python 3.x

  6. Architecture: x86_64

Step 2: Lambda Function Code

import boto3

def lambda_handler(event, context):
    # Initialize EC2 client
    ec2 = boto3.client('ec2')

    # Get all running EC2 instances
    active_instance_ids = set()
    instances_response = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )

    # Extract instance IDs
    for reservation in instances_response['Reservations']:
        for instance in reservation['Instances']:
            active_instance_ids.add(instance['InstanceId'])

    # Get all EBS snapshots
    snapshots_response = ec2.describe_snapshots(OwnerIds=['self'])

    for snapshot in snapshots_response['Snapshots']:
        snapshot_id = snapshot['SnapshotId']
        volume_id = snapshot.get('VolumeId')

        try:
            # Check if the snapshot's volume exists
            if volume_id:
                try:
                    volume_response = ec2.describe_volumes(VolumeIds=[volume_id])

                    # Check if volume is attached to any running instance
                    for volume in volume_response['Volumes']:
                        is_attached = False
                        for attachment in volume['Attachments']:
                            if attachment['InstanceId'] in active_instance_ids:
                                is_attached = True
                                break

                        if not is_attached:
                            # Delete snapshot if volume exists but not attached
                            ec2.delete_snapshot(SnapshotId=snapshot_id)
                            print(f"Deleted EBS snapshot {snapshot_id} as its volume is not attached to any running instance")

                except ec2.exceptions.ClientError as e:
                    if 'InvalidVolume.NotFound' in str(e):
                        # Delete snapshot if volume doesn't exist
                        ec2.delete_snapshot(SnapshotId=snapshot_id)
                        print(f"Deleted EBS snapshot {snapshot_id} as its associated volume was not found")

        except Exception as e:
            print(f"Error processing snapshot {snapshot_id}: {str(e)}")

Step 3: IAM Permissions

The Lambda function needs proper permissions to interact with EC2 resources. Create an IAM policy with these permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeSnapshots",
                "ec2:DeleteSnapshot",
                "ec2:DescribeVolumes",
                "ec2:DescribeInstances"
            ],
            "Resource": "*"
        }
    ]
}

Step 4: Lambda Configuration

  1. Increase the default timeout:

    • Default is 3 seconds

    • Set to 10 seconds or more depending on your environment

    • Navigate to Configuration → General configuration → Edit

  2. Attach the IAM policy to the Lambda execution role

Step 5: CloudWatch Event Trigger (Optional)

To automate the execution:

  1. Go to CloudWatch → Rules

  2. Create a new rule

  3. Set up a schedule (e.g., daily or weekly)

  4. Add the Lambda function as the target

Best Practices

  1. Testing: Always test in a non-production environment first

  2. Logging: Implement comprehensive logging for tracking deletions

  3. Notifications: Consider adding SNS notifications for deleted snapshots

  4. Age Check: Add conditions to check snapshot age before deletion

  5. Backup Strategy: Ensure this doesn't conflict with backup policies

Advanced Considerations

You can enhance this solution by:

  1. Adding age-based filtering (e.g., only delete snapshots older than 30 days)

  2. Implementing tag-based exclusions

  3. Adding cost reporting functionality

  4. Extending to other resources (e.g., AMIs, volumes)

  5. Adding pre-deletion validation checks

Conclusion

This automated solution helps maintain a clean AWS environment and reduce costs by removing unnecessary EBS snapshots. While this example focuses on snapshots, the same principles can be applied to other AWS resources like unattached EBS volumes, unused EIPs, or obsolete AMIs.

Remember to regularly review and adjust the cleanup criteria based on your organization's needs and backup requirements.

Next Steps

  1. Implement this solution in your AWS environment

  2. Monitor the cost savings

  3. Extend the solution to other resource types

  4. Set up alerting for deleted resources

  5. Document the process for your team


Happy cost optimizing! 🚀

0
Subscribe to my newsletter

Read articles from Amulya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Amulya
Amulya