AWS CloudWatch Guide and EC2 CPU ALERTING THROUGH SNS


Introduction to AWS CloudWatch
What is AWS CloudWatch?
AWS CloudWatch is a monitoring and observability service that helps you track your AWS resources' performance and operational health.
You can think of CloudWatch as a "gatekeeper" that watches activities in your AWS cloud. It collects and tracks metrics, logs, and alarms to ensure your infrastructure runs smoothly.
Problems CloudWatch Solves
Real-time Monitoring – Helps track AWS resource usage.
Automatic Alarming – Notifies you when thresholds are exceeded.
Complex Log management– Stores and analyzes logs for troubleshooting.
Custom Metrics – Enables tracking of non-default AWS metrics.
Cost Optimization – Identifies underutilized resources. Helps optimize resources and reduce spending..
Scaling Integration – Works with AWS Auto Scaling to adjust resources dynamically.
Lack of visibility – Tracks AWS resource performance.
No real-time monitoring needed– Alerts you when resources exceed thresholds.
Key Features of AWS CloudWatch
Feature 1: Monitoring
Tracks AWS services & applications.
Examples:
CPU utilization of an EC2 instance.
Number of API calls in AWS Lambda.
Memory consumption of a running application.
Feature 2: Metrics (Default & Custom)
Metric = Data point that represents resource performance. Real-time performance indicators like CPU usage, memory consumption, and API requests.
Default Metrics (Auto-generated by AWS):
EC2 CPU utilization
S3 bucket request count
Lambda function invocation count
Custom Metrics (User-defined):
Memory consumption (not auto-tracked)
API response time
Application error rates
Feature 3: Alarms
Alarms notify users when a metric crosses a threshold. Notifications triggered when a metric crosses a predefined threshold.
Example:
If CPU utilization exceeds 80%, send an email notification.
If free memory drops below 500MB, trigger autoscaling.
Feature 4: Log Insights
Collects & analyzes logs from AWS services. Stores logs from AWS services for debugging and auditing.
Example:
Logs failed login attempts to EC2.
Tracks API calls made to S3 buckets.
Feature 5: Cost Optimization
Helps reduce AWS costs by identifying underused resources.
Example:
Finds idle EC2 instances and shuts them down.
Identifies over-provisioned RDS databases.
Feature 6: Scaling & Automation
Integrates with AWS Auto Scaling to adjust resources automatically.
Example:
If CPU utilization exceeds 70%, add more EC2 instances.
If API request rate drops below 100 per second, reduce resources.
Customizable visual representation of monitored metrics with dashboards.
Automated responses to changes in your AWS environment with events.
Understanding Metrics and Alarms
What Are Metrics?
Metrics provide data points on AWS resource performance. Some common CloudWatch metrics include:
EC2 Instance CPU Utilization (Percentage of CPU usage)
Memory Utilization (Custom metric, as AWS does not track this by default)
API Requests (Number of API calls made to a service)
What Are Alarms?
Alarms help automate responses to metric changes. For example, if CPU usage exceeds 80%, CloudWatch can:
Send an email notification.
Trigger an Auto Scaling event to add instances.
Restart the EC2 instance.
Hands-on Demonstration #1: Configuring a CloudWatch Alarm for EC2 CPU Utilization
First create an EC2 instance with t2.micro and we will manually spike the CPU for demo purpose to see Realtime CPU Spikes.
By default EC2 instance send the metrices in span of 5 min, but we will change to 1 min here’s how:
- Select the EC2 instance > Monitoring > Manage Detailed Monitoring
- Just Enable
- Create a file for CPU Spikes manually in your EC2:
vim cpu_spike.py
import time
def simulate_cpu_spike(duration=30, cpu_percent=80):
print(f"Simulating CPU spike at {cpu_percent}%...")
start_time = time.time()
# Calculate the number of iterations needed to achieve the desired CPU utilization
target_percent = cpu_percent / 100
total_iterations = int(target_percent * 5_000_000) # Adjust the number as needed
# Perform simple arithmetic operations to spike CPU utilization
for _ in range(total_iterations):
result = 0
for i in range(1, 1001):
result += i
# Wait for the rest of the time interval
elapsed_time = time.time() - start_time
remaining_time = max(0, duration - elapsed_time)
time.sleep(remaining_time)
print("CPU spike simulation completed.")
if __name__ == '__main__':
# Simulate a CPU spike for 30 seconds with 80% CPU utilization
simulate_cpu_spike(duration=30, cpu_percent=80)
- Run the script
python3 cpu_spike.py
- As you can see our CPU Spike has increased: Below is EC2 Instance > Monitor
- See in Cloudwatch too
Our EC2 has spiked up to 100% with our python script! Successfully.
Note: We can change the graph view mode in different types too.
Let’s Note Some Points
If in a organisation our CPU Utilization is constantly hitting up to 70%-80%-90% with average time of 5-10min then we set the alarm, because that’s not a good indication for ongoing application. And also if CPU hits 100% for 1 sec that’s not a major issue but yes it’s a issue as well.
Let's Define Alarm Conditions
This alarm will notify when something goes bad in your EC2 instances or other services.
If issue is very critical you will get the notification in your mobile phone and you fix the problem instantly.
- Go to your Cloud watch > Alarms > In alarm > Create alarm
- Select Metrics to setup your alarm
- Select statistic to average (ideal for production)
Note: for demo purpose we will keep statistic to max (cause we can’t wait for longer period of time)
- Maximum with 1 min for demo purpose
- If CPU reaches equal or greater than 50% we will get an alarm.
- Write your alarm name and message
- Check it’s created
- Our alarm isn’t activated, you need to go to your E-mail and activate the service. And we used the SNS Service for sending E-mails through AWS.
The alarm is now active and will trigger when CPU usage exceeds 50%.
As you can see now state is active and ready to send notification.
Since our Alarm is activated: Trigger the CPU Spike with python program
Go to EC2 Instance
python3 cpu_spike.py
That red line in graph above is our limit to trigger the notification
Since now we have just explored the 1 metrics = CPU utilization but there 1036 metrics available, you can analyze how powerful it is.
Note: Dashboard in Cloudwatch is also same for tracking purpose, it just allow you to build good dashboard and you can calculate the group metrics inside that.
Subscribe to my newsletter
Read articles from Amit singh deora directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Amit singh deora
Amit singh deora
DevOps | Cloud Practitioner | AWS | GIT | Kubernetes | Terraform | ArgoCD | Gitlab