9th Week :- Kubernetes CPU-Based Autoscaling: From Metrics to YAML in One Guide

Table of contents
- ๐ Introduction
- โ๏ธ What Is Horizontal Pod Autoscaler (HPA)?
- ๐ฆ What Is Codevisor and Metrics Server?
- โ๏ธ Steps to Implement HPA on Any Cloud Provider (AWS, GCP, Azure, etc.)
- ๐ Example HPA YAML File Explained
- ๐งฎ What Are Requests and Limits?
- ๐ง How HPA Works Internally (CPU Example)
- ๐ ๏ธ Useful Commands for Monitoring
- โก What Is Cluster Autoscaler?
- ๐ How Cluster Autoscaler Works
- ๐ง Advanced Tips
- โ Conclusion
- ๐ Bonus: Quick Reference Commands
๐ Introduction
As your app traffic increases, how can you ensure that Kubernetes automatically scales your workload? Enter Horizontal Pod Autoscaler (HPA). HPA automatically increases or decreases the number of pod replicas in a deployment based on observed resource usage like CPU or memory.
โ๏ธ What Is Horizontal Pod Autoscaler (HPA)?
The Horizontal Pod Autoscaler (HPA) adjusts the number of replicas of a pod based on metrics such as CPU or custom metrics. It helps maintain optimal performance and cost-efficiency.
๐ง Why Use HPA?
Automatic scaling: No manual intervention.
Handles traffic spikes efficiently.
Improves resource utilization.
Cost optimization by reducing over-provisioning.
๐ฆ What Is Codevisor and Metrics Server?
๐ Codevisor
Codevisor
is often mistakenly used; the correct component for HPA is usually the metrics server.
๐ Metrics Server
Metrics Server is a lightweight, cluster-wide aggregator of resource usage data (CPU, memory, etc.). It's essential for HPA to function, as it provides the data HPA uses to decide scaling actions.
๐ฆ Install it with:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
โ๏ธ Steps to Implement HPA on Any Cloud Provider (AWS, GCP, Azure, etc.)
Ensure Metrics Server is Running
kubectl get deployment metrics-server -n kube-system
Set Resource Requests and Limits in Your Deployment
Enable HPA
kubectl autoscale deployment your-deployment-name --cpu-percent=50 --min=1 --max=5
Monitor HPA
kubectl get hpa
(Optional) Enable Cluster Autoscaler for Node-level scaling
๐ Example HPA YAML File Explained
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
๐ YAML Explanation
apiVersion
: Useautoscaling/v2
for advanced metrics.kind
: Defines it as an HPA.metadata.name
: Name of the HPA resource.scaleTargetRef
: The target workload (Deployment).minReplicas
,maxReplicas
: Scaling limits.metrics
: Defines metric type (cpu
), and target value.
๐งฎ What Are Requests and Limits?
These are part of resource management in Kubernetes:
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
โ๏ธ Differences
Term | Meaning |
Requests | Guaranteed minimum resources the container gets. |
Limits | Maximum resources the container can use. |
HPA uses CPU request value to calculate utilization:
CPU Utilization = actual usage / requested
๐ง How HPA Works Internally (CPU Example)
Metrics server collects CPU usage per pod.
HPA compares it to the target average (e.g. 60%).
If usage is high, it increases pod count using:
DesiredReplicas=CurrentReplicasร(CurrentUsage/TargetUsage)DesiredReplicas = CurrentReplicas ร (CurrentUsage / TargetUsage) DesiredReplicas=CurrentReplicasร(CurrentUsage/TargetUsage)
Kubernetes adjusts replicas accordingly.
๐ ๏ธ Useful Commands for Monitoring
๐ View HPA Details
kubectl describe hpa my-app-hpa
๐งช Check CPU Usage
kubectl top pod
kubectl top node
โ๏ธ Simulate Load
kubectl run -it --rm load-generator --image=busybox /bin/sh
# Then run:
while true; do wget -q -O- http://<your-app-service>; done
โก What Is Cluster Autoscaler?
HPA increases pods, but what if the cluster has no more resources to schedule them?
That's where Cluster Autoscaler comes in:
Increases or decreases the number of nodes in your cluster.
Works with cloud providers (GKE, EKS, AKS).
๐ Works in sync with HPA:
HPA โ Adds Pods โ If no space โ Cluster Autoscaler โ Adds Nodes
๐ How Cluster Autoscaler Works
Scheduler canโt place a pod due to resource shortage.
Cluster Autoscaler checks if adding a node can solve it.
If yes, a new node is provisioned.
When nodes are underutilized for a long time, they are removed.
๐ง Advanced Tips
Always set proper
requests
andlimits
.Avoid setting CPU/memory too low โ it may cause under-provisioning.
Monitor via Grafana/Prometheus for better observability.
โ Conclusion
Horizontal Pod Autoscaler is a critical tool for dynamic, efficient, and cost-effective scaling of Kubernetes workloads. With the combination of HPA and Cluster Autoscaler, your applications can be both resilient and scalable, meeting any level of traffic without manual overhead.
๐ Bonus: Quick Reference Commands
# Check pod usage
kubectl top pods
# Check node CPU and memory
kubectl top nodes
# View HPA status
kubectl get hpa
# Describe detailed HPA metrics
kubectl describe hpa my-app-hpa
# Create HPA with CLI
kubectl autoscale deployment my-app --cpu-percent=50 --min=1 --max=10
Subscribe to my newsletter
Read articles from Lav kushwaha directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
