SpendSense: From Concept to CI/CD - Building a Full-Stack Cloud Optimizer Energy-Efficient DevOps: Auto-Suspending Idle AWS Resources using ML

Welcome to the final post in my series on building SpendSense, an AI-powered cloud cost optimization platform. In the last few weeks, the project has evolved from a functional prototype into a polished, automated, and secure application. This post covers the final sprint of development, the real-world challenges I overcame, and the final architecture that brings it all together.
The Final Sprint: From MVP to Polished Product
Since the last update, the focus has been on moving beyond the basic features and implementing a professional, secure, and automated workflow. Here’s what’s new:
Secure AWS Integration with IAM Roles: The initial approach of storing user AWS keys was a known security risk. I completely refactored the authentication mechanism to use the industry-best-practice of Cross-Account IAM Roles. Users now grant secure, temporary, and permission-scoped access to the application without ever sharing their secret keys.
Live Cost Dashboard: What’s a budget optimizer without costs? I integrated the AWS Cost Explorer API to fetch and display the last seven days of spending, visualized in a beautiful, animated bar chart using Chart.js.
Proactive Email Alerting: The AI engine is now connected to Amazon Simple Email Service (SES). When an idle instance is detected, the system automatically sends a detailed email alert to the user. To prevent spam, I implemented a one-hour cooldown period per user, managed in the DynamoDB backend.
Interactive Instance Management: To turn insights into action, I added an "Auto-Stop" feature. A "Stop" button now appears next to any running EC2 instance, allowing users to immediately act on the AI's suggestions.
Overcoming Real-World Challenges: A DevOps Journey
Building a complex application always comes with hurdles. Overcoming them is where the real learning happens.
The Challenge: How to securely access user AWS accounts without handling their secret keys.
- The Solution: This was the most critical challenge. I implemented a robust IAM Role-based access system. This involved creating a Trust Policy on the user's role and an
sts:AssumeRole
permission on the application's user, demonstrating a deep understanding of the Principle of Least Privilege and secure multi-tenant architecture.
- The Solution: This was the most critical challenge. I implemented a robust IAM Role-based access system. This involved creating a Trust Policy on the user's role and an
The Challenge: Grafana dashboards and settings were lost every time the EC2 instance restarted.
- The Solution: I diagnosed this as a state-persistence issue with the Docker container. I solved it by implementing Docker Volumes, mapping a directory on the EC2 host to the Grafana container's data directory. This ensures that all dashboards, data sources, and user settings are preserved permanently.
The Challenge: The monitoring stack (Prometheus & Grafana) was difficult to deploy and update manually.
- The Solution: This led to the implementation of the project's crowning achievement: a full CI/CD pipeline.
The Pinnacle of Automation: The CI/CD Pipeline
Manual deployments are slow, risky, and outdated. To solve this, I built a complete Continuous Integration/Continuous Deployment (CI/CD) pipeline using GitHub Actions.
Here’s how it works:
Trigger: Every
git push
to themain
branch of the private GitHub repository automatically triggers the workflow.Setup: A GitHub-hosted runner checks out the code, sets up Python, and installs Ansible.
Authentication: The runner securely accesses the EC2 instance's SSH key, which is stored as an encrypted GitHub Secret.
Deployment: The workflow executes the Ansible playbook, which automatically configures the server and deploys the entire Dockerized monitoring stack (Prometheus, Grafana, and Node-Exporter).
This pipeline embodies the principles of Infrastructure as Code (IaC) and automation, ensuring that every deployment is fast, reliable, and consistent.
Tech Stack
This project integrates a wide range of modern technologies, each chosen for a specific purpose:
Frontend:
HTML
,Tailwind CSS
,JavaScript
,Chart.js
for a responsive, modern, and interactive user interface.Backend:
Python
andFlask
provide a robust and scalable foundation for the API and business logic.Database:
Amazon DynamoDB
was chosen as the user database for its serverless nature, scalability, and seamless integration with IAM.Cloud & Services: The application is built on a foundation of core AWS services. AWS EC2 provides the scalable compute capacity for hosting the monitoring stack. Security is paramount, managed through IAM Roles which grant secure, temporary, cross-account access based on the Principle of Least Privilege. For financial insights, AWS Cost Explorer is leveraged to provide FinOps data, turning raw billing information into actionable daily cost trends. Finally, Amazon SES provides a robust, scalable service for sending automated email alerts.
Observability & Monitoring: A complete, open-source observability stack provides deep insights into the application's host environment. Prometheus acts as the time-series database, pulling (
scraping
) metrics from Node-Exporter. These metrics are then visualized in Grafana, which is configured with persistent storage using Docker Volumes to create rich, interactive, and permanent dashboards for metrics like CPU usage.DevOps & Automation: Automation is at the heart of this project. Terraform was used in the initial stages for Infrastructure as Code (IaC) to provision the foundational EC2 instance and security groups. Ansible is used for Configuration Management, ensuring the server is consistently set up with necessary software like Docker and Nginx. The entire monitoring stack is containerized using Docker and orchestrated with Docker Compose for portability and easy management. A CI/CD pipeline built with GitHub Actions fully automates the deployment process, running the Ansible playbook on every push to the main branch.
Screenshots
Here are some screenshots showcasing the final application:
Login/Signup Page : A view of Login/signup page
Profile and AWS setting page: A page for connecting to your AWS account using ARN, and your email id for the alerts
Sample email : A view of how Sample Email would look like:
Main Dashboard: A view of the main dashboard with the stat cards, cost chart, and EC2 instance list.
Monitoring Page: A screenshot of the dedicated monitoring page showing the live Prometheus and Grafana dashboards.
GitHub Actions CI/CD Pipeline: A screenshot of a successful workflow run in the GitHub Actions tab.
Architecture Diagram
[ User ] <--> [ Browser (Flask Frontend on WSL) ]
|
+---------------------> [ GitHub ] --(git push)--> [ GitHub Actions (CI/CD) ]
| | (Ansible)
| v
+---------------------> [ AWS EC2 Instance ] <----------------+
| |
| +--> [ Docker Engine ]
| | |-- [ Prometheus ]
| | |-- [ Grafana ]
| | |-- [ Node-Exporter ]
| +--> [ Nginx Reverse Proxy ]
|
+---------------------> [ AWS Services ]
|-- [ DynamoDB (User Data) ]
|-- [ SES (Email Alerts) ]
|-- [ IAM (AssumeRole) ] ---> [ User's AWS Account ]
|-- [ EC2, S3 Data ]
|-- [ Cost Explorer Data ]
Github : https://github.com/Sahil0114/SpendSense
Conclusion:
This project was an incredible journey from a simple idea to a fully automated, secure, and professional cloud application. It demonstrates a holistic understanding of the modern cloud and DevOps landscape, from initial infrastructure provisioning with Terraform to final application deployment via a CI/CD pipeline. By integrating AI, cost management, and real-time monitoring, SpendSense serves as a powerful proof-of-concept for building intelligent, efficient, and automated cloud solutions.
Thank you for following along! Follow for more real-life project implementation workflows.
Subscribe to my newsletter
Read articles from Sahil Gada directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
