AWS cost optimization practices

One of the main reasons for people shifting from on-premises to the cloud is Cost Optimization. While you don't have to handle maintenance, failing to optimize the cloud can lead to exorbitant costs. AWS Lambda is one of the most popular services that help manage costs by only charging for the compute time you consume. It can be triggered by events from various services like S3 buckets, DynamoDB, CloudWatch, and many more, making it a versatile tool in your cost optimization toolkit. This blog will demonstrate two examples.

  1. Deleting stale snapshots automatically using Lambda functions

  2. Configuring CloudWatch and SNS to send email notifications to the user of a predefined metric of EC2.

We will also see how to optimize both examples further.

Pre-requisites:

  1. AWS account

    You need an AWS account and that's it. You can follow this blog.

Cost-optimization techniques :

The techniques to optimize costs are endless. In this first example, we will see how to delete stale snapshots. stale resources are the resources that are no longer required.

  1. Delete stale snapshots using Lambda and CloudWatch:

    Step 1:

    Launch an EC2 instance with the setting by default while keeping the AMI as Ubuntu. Go with default volumes as they are sufficient to demonstrate this tutorial.

    Now it's time to create snapshots. But at first, we need to know about the volume of the current running instance.

    To the storage section click on the instance and you will see something like this...

    Here, note that vol-0d8721289e4b5b5b9 this is the volume we want to take a snapshot of.

    Step 2:

    On the left hand, click on the snapshots option.

    Now, click on create snapshot and as we have only one volume, select the volume section and choose the volume of the instance that was created while launching our instance and then click on create snapshot.

    Now, the snapshot will be created and the dashboard will look like this:

    Step 3: Create an IAM role

    Set up policy:

    There are two things, IAM role and IAM users. just like IAM users get selected permissions to perform tasks, IAM roles define permission for services. So we need to assign a role to the Lambda service to do the task such as DescribeSnapshots, DescribeInstances, DeleteVolumes across all regions or the region you want to set.

    Go to the IAM section and create a policy first and then we will add that policy to the role.

    Click on create a policy

    Select service as EC2 as we are permitting deleting snapshots of EC2 instances.

    You can give all permissions of EC2 across all regions for ease. But that is not a good practice. So we will specify the permissions we need.

    Go to the list section and allow the DescribeInstances permission.

    also, search for DescribeVolumes and allow the permission from there,

    Also, allow DescribeSnapshots from that list.

    Next, we have to allow the DeleteSnapshot permission which you can find in write section.

    All the permissions are set. Set the ARN as all, if you want to be specific, you can.

    next, give the policy a name, give a description if you want, and create the policy.

    Set up Role:

    Go to the roles section, above the Policy section.

    Select AWS service and select service or use case as Lambda ( cause we are permitting the Lambda service )

    Search for the policy name you created and add that to the role.

    Next, give your role a name and click on create role.

    The dashboard will show your newly created role for Lambda service.

    Step 4:

    It's time to set up the Lambda function to start monitoring and deleting stale resources.

    Go to the services section and search for lambda.

    Let's create a function.

    Select Author from scratch

    Give a name

    Choose runtime as Python (Python 3.9)

    Keep the rest as default.

    Go to Change default execution role settings and choose the second option and select the role we just created now and create the function.

    After creating the function, go to the code section and replace the existing code with this -

     import boto3
    
     def lambda_handler(event, context):
         ec2 = boto3.client('ec2')
    
         # Get all EBS snapshots
         response = ec2.describe_snapshots(OwnerIds=['self'])
    
         # Get all active EC2 instance IDs
         instances_response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
         active_instance_ids = set()
    
         for reservation in instances_response['Reservations']:
             for instance in reservation['Instances']:
                 active_instance_ids.add(instance['InstanceId'])
    
         # Iterate through each snapshot and delete if it's not attached to any volume or the volume is not attached to a running instance
         for snapshot in response['Snapshots']:
             snapshot_id = snapshot['SnapshotId']
             volume_id = snapshot.get('VolumeId')
    
             if not volume_id:
                 # Delete the snapshot if it's not attached to any volume
                 ec2.delete_snapshot(SnapshotId=snapshot_id)
                 print(f"Deleted EBS snapshot {snapshot_id} as it was not attached to any volume.")
             else:
                 # Check if the volume still exists
                 try:
                     volume_response = ec2.describe_volumes(VolumeIds=[volume_id])
                     if not volume_response['Volumes'][0]['Attachments']:
                         ec2.delete_snapshot(SnapshotId=snapshot_id)
                         print(f"Deleted EBS snapshot {snapshot_id} as it was taken from a volume not attached to any running instance.")
                 except ec2.exceptions.ClientError as e:
                     if e.response['Error']['Code'] == 'InvalidVolume.NotFound':
                         # The volume associated with the snapshot is not found (it might have been deleted)
                         ec2.delete_snapshot(SnapshotId=snapshot_id)
                         print(f"Deleted EBS snapshot {snapshot_id} as its associated volume was not found.")
    

    click on the deploy and the code will be deployed and saved.

    Step 5: Error handling

    You might see this error after clicking on Test. We need to increase the Task time to resolve this issue. To do that, go to configuration -> general config -> edit

    set timeout as 10 seconds, and save. By default, the timeout is 3 sec for Lambda function.

    Now if you try to test the code, it will show a success.

    Step 6: Terminate the instance

    Before:

    Check the snapshot for the running instance. If you see the EC2 dashboard, one instance is running, volume is attached to the snapshot, so the lambda function will not delete the snapshot.

    After:

    Delete the instance and wait for the volume to get deleted.

    As you can see, we still have one snapshot after deleting the volumes. So it's a stale resource, right?

    Step 7: Test lambda function

    Now test the lambda function and you will see the snapshot has been deleted automatically.

    Tips: Here we have triggered the function manually. But we can use AWS cloudwatch service to trigger the lambda function on the scheduling bases ( cronjobs )

  2. CPU utilization with EC2, SNS and CloudWatch :

    Step 1: Create an Instance first:-

    Create an instance with Ubuntu AMI and make sure that auto assign public IP option is enabled.

    Step 2: Select metrics

    Go to the Cloudwatch service and select the all metrics section.

    From there, select EC2 and then select accross all instances. If that option is not available for you, choose per-instance metrics which will specify the action for that instance only.

    From there, search for the metric CPUUtilization and select 1h from the top which will show the CPU usage for last 1hr.

    These metrics can also be monitored from the monitoring section of an EC2 instance.

    By default EC2 instances sends information about their state to CloudWatch every 5 minutes. But for this demo, we will be customising it to the 1 minute.

    For that, enable detailed monitoring from the left side and enable the option.

    Okay. Now our metric is set.

    Step 3: Set the alarm

    Go to the alarms section and select in alarm and create an alarm with below config.

    Go to select metric -> EC2 ( as we want to monitor metric for EC2 instance ) -> Accross all instances -> choose CPUUtilization metric.

    If you are going for per-instance-metrics, make sure to choose the id matching with the running instance's id.

    I have set it as maximum of 1 minute to watch before alarming. You can do average as well.

    I am going to configure the alarm if the EC2 is using less than 40 percent of the CPU.

    Step 4: Set the SNS topic

    We will send email notification. For that, we will use SNS (Simple Notification Service).

    Click on create new topic

    Enter name and an email where you want to be notified and create the topic.

    Select the custom message and save it in the next stage.

    Step 4: Subs confirmation

    In your mail box, AWS will send a notification to confirm this subscription. Click on confirm subscription.

    Step 5: Get notified

    If you run top command on the instance, you will see that CPU consumption is near about zero as we are not performing anything to that instance.

    if you check the alarm now, it will be triggered.

    and that mail will receive a notification regarding this...

    Tips: We can automate the creation and deletion of EC2 instances and run it on a scheduled basis and look for metrics to send notifications via mail or Slack.

    To automate this, the below will be used for Lambda function.

    Lambda code reference to stop EC2:

     import boto3
     region = '<enter_your_region>'
     instances = ['<instanceid_1', 'instanceid_2']
     ec2 = boto3.client('ec2', region_name=region)
    
     def lambda_handler(event, context):
         ec2.stop_instances(InstanceIds=instances)
         print('stopped your instances: ' + str(instances))
    

    Lambda code reference to start EC2:

     import boto3
     region = '<enter_your_region>'
     instances = ['<instanceid_1', 'instanceid_2']
     ec2 = boto3.client('ec2', region_name=region)
    
     def lambda_handler(event, context):
         ec2.start_instances(InstanceIds=instances)
         print('started your instances: ' + str(instances))
    

    replace the region and instance ID with your region and EC2's ID.

Yay, we have seen two ways to optimize the cost for AWS services ๐ŸŽ‰!

Follow me on LinkedIn and Twitter for more..

0
Subscribe to my newsletter

Read articles from Bandhan Majumder directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bandhan Majumder
Bandhan Majumder

I am Bandhan from West Bengal, India. I write blogs on DevOps, Cloud, AWS, Security and many more. I am open to opportunities and looking for collaboration. Contact me with: bandhan.majumder4@gmail.com