AWS Cloud Cost Optimization Using Lambda Function Project Boto3

Cost optimization is the most important project for DevOps and cloud engineers.

Cloud cost optimization is a critical responsibility for DevOps engineers. Organizations migrate to the cloud primarily for two reasons:

  1. Reducing the overhead of managing physical infrastructure – Setting up an entire data center involves purchasing and configuring servers, requiring a dedicated team, which can be costly and time-consuming.

  2. Optimizing infrastructure costs – While cloud services offer flexibility and scalability, inefficient resource management can lead to unexpectedly high expenses.

For startups we simply tells use AWS cloud, but only if are able to monitor cost effectiveness and we do the things efficiently.

However, simply moving to the cloud does not guarantee cost savings. Organizations often face challenges such as stale resources—cloud assets that are no longer in use but still incurring costs.

Understanding the Problem

Example: If a developer creates an EC2 instance, there is an auto volume attached to it. The developer starts filling the volume with sensitive information of the organization, so we need a backup.

The developer starts taking backups every day, essentially creating "snapshots = backup."

Case 1: The developer decides to delete the EC2 instance and the volume.

Case 2: In another case, he moves the volume to an external source and forgets to delete the volume.

He forgets to delete the external source snapshot, but AWS will still charge for it.

More Example: 1. S3 bucket created, no more using it but AWS will on charging.
2. Created an EKS cluster but forgot to delete.

Stale resources refer to unused or forgotten cloud assets that increase operational costs unnecessarily. Common examples include:

  • Unused EBS snapshots that persist even after the associated EC2 instance or volume is deleted.

  • Orphaned S3 buckets containing outdated data with no active use.

  • Idle EKS clusters created for testing but never removed.

Manually identifying and deleting stale resources across multiple AWS services is complex and time-consuming. This is where automation using AWS Lambda and Python becomes valuable.

Solution: Automating Cost Optimization

To address this issue, we can automate stale resource identification and cleanup using AWS Lambda. The process involves:

  1. Creating an AWS Lambda Function – A function is developed to scan AWS resources and identify unused assets.

  2. Using Python and Boto3 – The function is written in Python using the Boto3 library to interact with AWS APIs.

  3. Implementing Event-Driven Execution – The Lambda function is scheduled using Amazon CloudWatch Events to run periodically.

  4. Fetching EBS Snapshots – The script retrieves all available snapshots and checks their associations.

  5. Deleting Unused Snapshots – If a snapshot is not linked to an active EC2 instance or volume, it is automatically deleted.

  6. Sending Notifications (Optional) – Before deleting resources, alerts can be sent using Amazon SNS.

Implementation Steps

Create an EC2 Instance and Snapshot

Let’s begin by creating an EC2 instance and taking a snapshot of the volume attached to it (ec2 instance). Here's the basic process:

  • Launch an EC2 instance: We'll create a test instance (Ubuntu T2 micro).

  • See above by default when we launch the instance, we get a volume attached to it by default. And also, we can attach an external volume with “Add new volume” option.

  • Create a snapshot of the attached volume: Once the instance is running, we'll create a snapshot of its attached volume. Think of this snapshot as a backup or image of the volume (attached to ec2).

Our instance is created. Check volume now > (LEFT) Elastic Block Store > Volume

Our attached volume has also been created with EC2 instance.


  • Create a Snapshot

Go to your EC2 dashboard > Snapshots

test-snap created

Now our developer wants to delete the EC2 instance + Volume + Snapshot created.

As our developer took the snapshots every day, and he forgot to delete the snapshot, while deleting the EC2 and volume. As volume gets automatically deleted when we delete the EC2 instance.


Lambda Function for Cost Optimization

Now, let's set up a Lambda function that will help us automate the process of cleaning up old snapshots.

  1. Login to AWS Console
  • Go to the AWS Management Console.

  • Navigate to Lambda in the Services menu.

  1. Create a Lambda Function
  • Click on Create function.

  • Select Author from Scratch.

  • Name your function: cost-optimization-ebs-snapshot.

  • Choose Python as the runtime.

  • Leave the default execution role and update it later for permissions.

  1. Deploy Lambda Function
  • Once the function is created, scroll down to the Function code section.

  • Paste your code in the code editor (we’ll get the code in the next steps).

  • Click Deploy to deploy the Lambda function.

  1. Changing the default execution time to 10 sec. And default is 3 sec.

It’s important to keep execution time as less as possible, because time is also considered as a parameter in AWS Lambda for charge just like Lambda execution.

Permissions Issue: Initially, the Lambda function will fail due to permission issues. This is because the role attached to the Lambda function doesn't have the necessary permissions to describe snapshots, volumes, or EC2 instances.

Grant Permissions to Lambda Role

  • Navigate to the Lambda Permissions

    • In the Lambda function's configuration, find the Execution role. This is the role Lambda uses to interact with other AWS services.

    • Click on the Execution role to go to IAM.

Click on “Role Name” > Configuration

Permissions Granting:

We’ll need to grant the Lambda function permission to describe snapshots, describe volumes, and describe EC2 instances.

You can add these permissions through IAM policies. In our case, we'll create a custom policy with the permissions for describing snapshots, volumes, and instances.

PermissionsAdd permissionsAttach policies

  • You need to give the Lambda role permissions to interact with EC2 and EBS services.

    • Go to PermissionsAdd permissionsAttach policies (Inline).

    • Go to IAMPoliciesCreate Inline Policy.

      For Service, choose EC2.

    • Create a custom policy that allows actions on EC2 snapshots and volumes. Use the following actions:

      • Describe snapshots (for listing snapshots)

      • Delete snapshots (for deleting snapshots)

      • Describe instances (to find if an EC2 instance is associated with a volume)

      • Describe volumes (for volume information)

Save the policy and attach it to the Lambda execution role.

Test Lambda Function

  1. Create Test Event:

    • In the Lambda console, go to the Test tab.

    • Create a Test Event (you can leave it empty since you’re manually invoking).

    • Save the test event.

  2. Invoke Lambda:

    • Click the Test button to manually invoke the Lambda function.

    • You should see that if a snapshot is associated with a deleted volume or a volume not attached to any running EC2 instance, it will be deleted.

  3. Error Handling:

    • If you see permission issues or execution timeouts:

      • Increase the timeout in the Lambda configuration from the default of 3 seconds to 10 seconds or more.

      • Check the IAM Role to ensure it has the correct permissions.

Timeout Error: The Lambda function may also fail because the default timeout for Lambda functions is set to 3 seconds, which is not enough for our case. We’ll increase the timeout to 10 seconds to give the function more time to execute.

Note: We have increased the lambda function execution timeout and granted permission to Lambda function roles, with Describe Ec2 Instances and Volumes, also describe and delete snapshots. Then we have clicked on Test and executed.
We have created Test event because when you are manually running the Lambda function instead of any CloudWatch or s3 trigger, then you can achieve the functioning of lambda function with help of “test event”


Lambda Code Walkthrough Before moving further

Here’s a simple breakdown of the code logic:

  1. Imports and Setup:

     import boto3
     import datetime
    
    • Import Boto3 to interact with AWS services like EC2 and EBS.
  2. Initialize Clients:

     ec2 = boto3.client('ec2')
    
    • Create a connection to EC2 service using the Boto3 client.
  3. List EC2 Instances:

     instances = ec2.describe_instances()
     active_instance_ids = set()
     for reservation in instances['Reservations']:
         for instance in reservation['Instances']:
             active_instance_ids.add(instance['InstanceId'])
    
    • Get all EC2 instances and save their IDs for later use.
  4. List EBS Snapshots and Volumes:

     snapshots = ec2.describe_snapshots(OwnerIds=['self'])
     volumes = ec2.describe_volumes()
    
    • Get a list of snapshots and volumes. This will allow you to check if a snapshot belongs to a volume and if the volume is attached to a running EC2 instance.
  5. Process Snapshots:

     for snapshot in snapshots['Snapshots']:
         volume_id = snapshot.get('VolumeId')
         if volume_id:
             volume = next((vol for vol in volumes['Volumes'] if vol['VolumeId'] == volume_id), None)
             if volume and volume['State'] == 'available':
                 attached_to_instance = any(instance['InstanceId'] in active_instance_ids for instance in volume['Attachments'])
                 if not attached_to_instance:
                     ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
             else:
                 ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])
    
    • Loop through all snapshots.

    • Check if the snapshot is associated with a volume.

    • If the volume is not attached to a running EC2 instance, delete the snapshot.


Testing Practically

Case1: Deleting EC2 Instances + Volumes

  1. Terminating the EC2

  1. Run the Test Lambda Function

As you can see as expected, our Lambda function have deleted the EBS Snapshot as we have terminated the EC2 instance and volume also got deleted.

The EBS Snapshot became as stale resource and got deleted.

“Like this we can create 100 snapshots and delete all the snapshots and delete at one go”. We can write lambda function for S3 buckets, RDS Instances, Eks instances.

Case 2: Creating only volume and Snapshot

Created single volume without any EC2 instance with 1gib.

Creating a snapshot for volume

This snapshot is for the volume we have created without the EC2 instance.

We have 1 snapshot now of our stale volume we have created

We have invoked our lambda function and EBS Snapshot has been deleted as it’s not part of running instance. Because we had the condition in our function to delete the instance even if ec2 is not present or terminated.

Snapshot deleted successfully


EXTRA

Automate Lambda Execution with CloudWatch

  1. Create CloudWatch Rule:

    • Go to CloudWatchRulesCreate Rule.

    • This CloudWatch is creating an event bridge between cloud watch and lambda function

    • Set the schedule for the rule. For example, to run the Lambda function once a day, use cron expression cron(0 0 * * ? *).

    • Select Lambda function as the target, and choose the Lambda function you just created.

    • Click Create.

  2. Configure Lambda to Automatically Trigger:

    • This will allow the Lambda function to run on the defined schedule. Each time it runs, it will check snapshots and delete those that are stale (unassociated with volumes or EC2 instances).

Monitor and Clean Up

  1. Monitor the Execution:

    • Go to LambdaMonitoring to view logs and ensure the function is working as expected.

    • You can review CloudWatch logs if needed to troubleshoot.

  2. Clean Up Resources:

    • Delete Snapshots: Ensure that snapshots and volumes created during testing are deleted to avoid unnecessary costs.

    • Delete Lambda Function: If not needed, delete the Lambda function to avoid any unwanted executions.

Conclusion

  • You now have a fully functional Lambda function to automate EBS snapshot cost optimization. It deletes snapshots that are no longer associated with an active volume or EC2 instance, helping to reduce unnecessary storage costs.

  • You can expand this by adding more conditions, such as checking the age of snapshots or triggering alerts before deletion.

0
Subscribe to my newsletter

Read articles from Amit singh deora directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Amit singh deora
Amit singh deora

DevOps | Cloud Practitioner | AWS | GIT | Kubernetes | Terraform | ArgoCD | Gitlab