Kubernetes HPA Tutorial Autoscaling Pods with Metrics Server & Cluster

📌 Introduction

As your app traffic increases, how can you ensure that Kubernetes automatically scales your workload? Enter Horizontal Pod Autoscaler (HPA). HPA automatically increases or decreases the number of pod replicas in a deployment based on observed resource usage like CPU or memory.

⚙️ What Is Horizontal Pod Autoscaler (HPA)?

The Horizontal Pod Autoscaler (HPA) adjusts the number of replicas of a pod based on metrics such as CPU or custom metrics. It helps maintain optimal performance and cost-efficiency.

🧠 Why Use HPA?

Automatic scaling: No manual intervention.
Handles traffic spikes efficiently.
Improves resource utilization.
Cost optimization by reducing over-provisioning.

📦 What Is Codevisor and Metrics Server?

🔍 Codevisor

Codevisor is often mistakenly used; the correct component for HPA is usually the metrics server.

📈 Metrics Server

Metrics Server is a lightweight, cluster-wide aggregator of resource usage data (CPU, memory, etc.). It's essential for HPA to function, as it provides the data HPA uses to decide scaling actions.

📦 Install it with:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

☁️ Steps to Implement HPA on Any Cloud Provider (AWS, GCP, Azure, etc.)

Ensure Metrics Server is Running

 kubectl get deployment metrics-server -n kube-system

Set Resource Requests and Limits in Your Deployment

Enable HPA

 kubectl autoscale deployment your-deployment-name --cpu-percent=50 --min=1 --max=5

Monitor HPA
```
 kubectl get hpa
```
(Optional) Enable Cluster Autoscaler for Node-level scaling

📘 Example HPA YAML File Explained

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

📖 YAML Explanation

apiVersion: Use autoscaling/v2 for advanced metrics.
kind: Defines it as an HPA.
metadata.name: Name of the HPA resource.
scaleTargetRef: The target workload (Deployment).
minReplicas, maxReplicas: Scaling limits.
metrics: Defines metric type (cpu), and target value.

🧮 What Are Requests and Limits?

These are part of resource management in Kubernetes:

resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

⚙️ Differences

Term	Meaning
Requests	Guaranteed minimum resources the container gets.
Limits	Maximum resources the container can use.

HPA uses CPU request value to calculate utilization:
CPU Utilization = actual usage / requested

🧠 How HPA Works Internally (CPU Example)

Metrics server collects CPU usage per pod.
HPA compares it to the target average (e.g. 60%).
If usage is high, it increases pod count using:

DesiredReplicas=CurrentReplicas×(CurrentUsage/TargetUsage)DesiredReplicas = CurrentReplicas × (CurrentUsage / TargetUsage) DesiredReplicas=CurrentReplicas×(CurrentUsage/TargetUsage)
Kubernetes adjusts replicas accordingly.

🛠️ Useful Commands for Monitoring

🔍 View HPA Details

kubectl describe hpa my-app-hpa

🧪 Check CPU Usage

kubectl top pod
kubectl top node

⚒️ Simulate Load

kubectl run -it --rm load-generator --image=busybox /bin/sh
# Then run:
while true; do wget -q -O- http://<your-app-service>; done

⚡ What Is Cluster Autoscaler?

HPA increases pods, but what if the cluster has no more resources to schedule them?

That's where Cluster Autoscaler comes in:

Increases or decreases the number of nodes in your cluster.
Works with cloud providers (GKE, EKS, AKS).

📍 Works in sync with HPA:

HPA → Adds Pods → If no space → Cluster Autoscaler → Adds Nodes

🔄 How Cluster Autoscaler Works

Scheduler can’t place a pod due to resource shortage.
Cluster Autoscaler checks if adding a node can solve it.
If yes, a new node is provisioned.
When nodes are underutilized for a long time, they are removed.

🧠 Advanced Tips

Always set proper requests and limits.
Avoid setting CPU/memory too low – it may cause under-provisioning.
Monitor via Grafana/Prometheus for better observability.

✅ Conclusion

Horizontal Pod Autoscaler is a critical tool for dynamic, efficient, and cost-effective scaling of Kubernetes workloads. With the combination of HPA and Cluster Autoscaler, your applications can be both resilient and scalable, meeting any level of traffic without manual overhead.

📎 Bonus: Quick Reference Commands

# Check pod usage
kubectl top pods

# Check node CPU and memory
kubectl top nodes

# View HPA status
kubectl get hpa

# Describe detailed HPA metrics
kubectl describe hpa my-app-hpa

# Create HPA with CLI
kubectl autoscale deployment my-app --cpu-percent=50 --min=1 --max=10

9th Week :- Kubernetes CPU-Based Autoscaling: From Metrics to YAML in One Guide

Table of contents