Trim 60% Off Your AWS Bill: 5 Proven Tactics That Keep Performance Intact


Slash Your AWS Bill: 5 Battle-Tested Steps to 60% Savings (Without Performance Loss)
Engineered for immediate execution. All tactics proven in production at 100+ companies.
The Reality: 73% of AWS workloads are overprovisioned (Flexera 2024). Follow these steps to reclaim your budget:
Step 1: Deploy Spot Instances Like a Pro (Save 70-90%)
Target: Stateless workloads (batch jobs, containers, CI/CD, analytics).
The Execution Plan:
Identify Candidates:
Run:
aws cloudwatch get-metric-data --metric-name CPUUtilization --namespace AWS/EC2
Target workloads with <60% CPU + fault tolerance (e.g., Kafka consumers).
Configure Spot Fleet:
aws ec2 request-spot-fleet --spot-fleet-request-config file://config.json
Sample config.json: Diversify across 4+ instance types (e.g., m5.large, c5.xlarge) in 2 AZs.
Automate Recovery:
Use EC2 Auto Scaling + CloudWatch: Restart interrupted jobs.
For Kubernetes: Install Karpenter (auto-provisions replacements in <60 sec).
✅ Production Result: AdTech firm reduced 18,000 core/hour analytics cost from $1,200/day → $263/day.
⚠️ Avoid: Databases, stateful services.
Step 2: Automate Non-Prod Shutdowns (Save 65% Overnight)
Target: Dev/QA environments running 24/7.
The Execution Plan:
Tag Resources:
aws ec2 create-tags --resources i-1234567890abcdef0 --tags Key=Schedule,Value=mon-fri-9to5
Deploy Lambda Scheduler:
Use AWS Instance Scheduler (pre-built CloudFormation template).
Set schedules:
{"weekdays": {"start": "09:00", "stop": "17:00"}}
Exempt Critical Resources:
- Add tag:
"Schedule": "exclude"
to prod databases.
- Add tag:
✅ Production Result: SaaS company saved $18,000/month auto-stopping 287 dev instances nights/weekends.
Step 3: Rightsize Relentlessly (Save 20-50% Per Instance)
Stop paying for idle capacity.
The Execution Plan:
Find Waste:
AWS Compute Optimizer → "Under-provisioned Recommendations".
Target: Instances with avg CPU <40% + RAM <50% over 14 days.
Downsize Strategically:
Before: m5.4xlarge ($0.768/hr)
After: m5.2xlarge ($0.384/hr) → Saves 50%
Upgrade Generations:
- Switch Intel → Graviton (e.g., c6g.4xlarge): 20% cheaper + 40% better perf.
Optimize Storage:
Migrate gp2 → gp3 volumes:
aws ec2 modify-volume --volume-id vol-12345 --volume-type gp3 --iops 5000 --throughput 250
✅ Production Result: Media company cut $14,000/month by downsizing 73 overprovisioned RDS/EC2 instances.
Step 4: Eliminate Orphaned Resources (Save 5-15% "Silent Tax")
Zombie resources draining $1,000s monthly.
The Execution Plan:
Run Detection Scripts:
# Find unattached EBS volumes: aws ec2 describe-volumes --filters Name=status,Values=available --query "Volumes[*].VolumeId"
Automate Deletion:
- Deploy Cloud Custodian policy:
policies:
- name: delete-unattached-ebs
resource: ebs
filters:
- Attachments: [] # Unattached volumes
actions:
- type: delete
Clean S3:
- Enable S3 Lifecycle Rules: Delete incomplete multipart uploads after 1 day.
✅ Production Result: E-commerce platform recovered $5,200/month deleting 2,400 old snapshots + 17 idle load balancers.
Step 5: Strategic Savings Plans (Save 72% Like Enterprise)
Better than RIs: Flexible across instance families.
The Execution Plan:
Analyze Usage:
Cost Explorer → "Savings Plans Recommendations".
Target: Baseline steady-state workloads (e.g., always-on APIs).
Buy Compute Savings Plan:
- Commit to $20/hr for 1 year → get 72% discount on covered usage.
Layer with Spot:
- Use Savings Plans for 60% baseline, Spot for 40% variable spikes.
✅ Production Result: Gaming studio paid $0.08/hr (vs. $0.30/hr) for 50,000 core-hours/day.
Your 30-Day Implementation Roadmap
Week 1:
Enable Cost Explorer + Compute Optimizer.
Tag all non-prod resources.
Week 2:
Deploy Spot Fleet for 1 workload (start with CI/CD).
Launch Lambda scheduler for dev envs.
Week 3:
Rightsize 5 worst-sized instances.
Run Cloud Custodian cleanup.
Week 4:
- Purchase Savings Plan for top steady workload.
Expected Results:
Month 1: 15-25% savings
Month 3: 40-60% savings
Critical Avoidances
❌ Never run Spot for stateful services
❌ Don’t downsizing during peak loads
❌ Test GP3 volumes before mass migration
Tool Stack:
Visibility: Cost Explorer, CloudHealth
Automation: Cloud Custodian, Instance Scheduler
Optimization: Compute Optimizer, Karpenter
"After implementing steps 1+2, we cut $74k/month within 15 days. Full 5-step adoption saved $2.1M/year."
– VP Cloud Ops, Fortune 500 Retailer
Subscribe to my newsletter
Read articles from Mohammad Azhar Hayat directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
