9th Week :- Kubernetes CPU-Based Autoscaling: From Metrics to YAML in One Guide

Lav kushwahaLav kushwaha
4 min read

๐Ÿ“Œ Introduction

As your app traffic increases, how can you ensure that Kubernetes automatically scales your workload? Enter Horizontal Pod Autoscaler (HPA). HPA automatically increases or decreases the number of pod replicas in a deployment based on observed resource usage like CPU or memory.


โš™๏ธ What Is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) adjusts the number of replicas of a pod based on metrics such as CPU or custom metrics. It helps maintain optimal performance and cost-efficiency.

๐Ÿง  Why Use HPA?

  • Automatic scaling: No manual intervention.

  • Handles traffic spikes efficiently.

  • Improves resource utilization.

  • Cost optimization by reducing over-provisioning.


๐Ÿ“ฆ What Is Codevisor and Metrics Server?

๐Ÿ” Codevisor

Codevisor is often mistakenly used; the correct component for HPA is usually the metrics server.

๐Ÿ“ˆ Metrics Server

Metrics Server is a lightweight, cluster-wide aggregator of resource usage data (CPU, memory, etc.). It's essential for HPA to function, as it provides the data HPA uses to decide scaling actions.

๐Ÿ“ฆ Install it with:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

โ˜๏ธ Steps to Implement HPA on Any Cloud Provider (AWS, GCP, Azure, etc.)

  1. Ensure Metrics Server is Running

     kubectl get deployment metrics-server -n kube-system
    
  2. Set Resource Requests and Limits in Your Deployment

  3. Enable HPA

     kubectl autoscale deployment your-deployment-name --cpu-percent=50 --min=1 --max=5
    
  4. Monitor HPA

     kubectl get hpa
    
  5. (Optional) Enable Cluster Autoscaler for Node-level scaling


๐Ÿ“˜ Example HPA YAML File Explained

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

๐Ÿ“– YAML Explanation

  • apiVersion: Use autoscaling/v2 for advanced metrics.

  • kind: Defines it as an HPA.

  • metadata.name: Name of the HPA resource.

  • scaleTargetRef: The target workload (Deployment).

  • minReplicas, maxReplicas: Scaling limits.

  • metrics: Defines metric type (cpu), and target value.


๐Ÿงฎ What Are Requests and Limits?

These are part of resource management in Kubernetes:

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

โš™๏ธ Differences

TermMeaning
RequestsGuaranteed minimum resources the container gets.
LimitsMaximum resources the container can use.

HPA uses CPU request value to calculate utilization:
CPU Utilization = actual usage / requested


๐Ÿง  How HPA Works Internally (CPU Example)

  1. Metrics server collects CPU usage per pod.

  2. HPA compares it to the target average (e.g. 60%).

  3. If usage is high, it increases pod count using:

    DesiredReplicas=CurrentReplicasร—(CurrentUsage/TargetUsage)DesiredReplicas = CurrentReplicas ร— (CurrentUsage / TargetUsage) DesiredReplicas=CurrentReplicasร—(CurrentUsage/TargetUsage)

  4. Kubernetes adjusts replicas accordingly.


๐Ÿ› ๏ธ Useful Commands for Monitoring

๐Ÿ” View HPA Details

kubectl describe hpa my-app-hpa

๐Ÿงช Check CPU Usage

kubectl top pod
kubectl top node

โš’๏ธ Simulate Load

kubectl run -it --rm load-generator --image=busybox /bin/sh
# Then run:
while true; do wget -q -O- http://<your-app-service>; done

โšก What Is Cluster Autoscaler?

HPA increases pods, but what if the cluster has no more resources to schedule them?

That's where Cluster Autoscaler comes in:

  • Increases or decreases the number of nodes in your cluster.

  • Works with cloud providers (GKE, EKS, AKS).

๐Ÿ“ Works in sync with HPA:

HPA โ†’ Adds Pods โ†’ If no space โ†’ Cluster Autoscaler โ†’ Adds Nodes


๐Ÿ”„ How Cluster Autoscaler Works

  1. Scheduler canโ€™t place a pod due to resource shortage.

  2. Cluster Autoscaler checks if adding a node can solve it.

  3. If yes, a new node is provisioned.

  4. When nodes are underutilized for a long time, they are removed.


๐Ÿง  Advanced Tips

  • Always set proper requests and limits.

  • Avoid setting CPU/memory too low โ€“ it may cause under-provisioning.

  • Monitor via Grafana/Prometheus for better observability.


โœ… Conclusion

Horizontal Pod Autoscaler is a critical tool for dynamic, efficient, and cost-effective scaling of Kubernetes workloads. With the combination of HPA and Cluster Autoscaler, your applications can be both resilient and scalable, meeting any level of traffic without manual overhead.


๐Ÿ“Ž Bonus: Quick Reference Commands

# Check pod usage
kubectl top pods

# Check node CPU and memory
kubectl top nodes

# View HPA status
kubectl get hpa

# Describe detailed HPA metrics
kubectl describe hpa my-app-hpa

# Create HPA with CLI
kubectl autoscale deployment my-app --cpu-percent=50 --min=1 --max=10
0
Subscribe to my newsletter

Read articles from Lav kushwaha directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Lav kushwaha
Lav kushwaha