🔄 Horizontal Pod Autoscaling (HPA) in Kubernetes:

URL: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more Pods.
This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.
If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet or other similar resource) to scale back down.
Horizontal pod autoscaling does not apply to objects that can't be scaled (for example: a DaemonSet.)
The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller.
The horizontal pod autoscaling controller, running within the Kubernetes control plane, periodically adjusts the desired scale of its target (for example, a Deployment) to match observed metrics such as average CPU utilization, average memory utilization or any other custom metric you specify.
The Horizontal Pod Autoscaler is an API resource in the Kubernetes autoscaling API group.

✅ What is HPA?

Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment, replica set, or statefulset based on observed CPU utilization (or custom metrics).

HPA helps apps scale out when under high load and scale in when under low usage, improving performance and resource utilization.

🧠 How It Works

The HPA controller runs in the Kubernetes control plane.
It periodically checks the metrics (CPU or others) from the metrics server.
Based on the usage, it increases or decreases the number of pods between a configured min and max limit.

📊 Default Metric Used

By default, HPA uses:

CPU utilization (percentage)

But it can also use:

Memory usage
Custom or external metrics

⚙️ Pre-requisite

Make sure the Metrics Server is installed in the cluster:

kubectl get deployment metrics-server -n kube-system

🛠️ Create HPA Example

kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

📌 This creates an HPA that:

Targets my-app deployment
Maintains average CPU at 50%
Scales between 2 and 10 pods

📋 YAML Example

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

Create with:

kubectl apply -f hpa.yaml

📜 List of HPA Commands

Task	Command
🔍 Get all HPAs	`kubectl get hpa`
🔍 Get HPA with details	`kubectl describe hpa <hpa-name>`
✏️ Edit HPA	`kubectl edit hpa <hpa-name>`
❌ Delete HPA	`kubectl delete hpa <hpa-name>`
📊 View metrics	`kubectl top pods` and `kubectl top nodes`
✅ Create HPA	`kubectl autoscale deployment <name> --cpu-percent=<value> --min=<min> --max=<max>`

🔄 How Scaling Happens

Imagine CPU goes beyond 50% threshold:

HPA controller detects this
It increases pods from, say, 2 to 4
When CPU drops below 50%, it scales back down

Install Metrics Server

  kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Get Deployment

  kubectl get deployment metrics-server -n kube-system

Edit the Metrics Server Deployment

  kubectl edit deployment metrics-server -n kube-system

Add the security bypass to deployment under container.args

  - --kubelet-insecure-tls
  - --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP

Restart the deployment

  kubectl rollout restart deployment metrics-server -n kube-system

Lab Practice:

Sample YAML

Deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m

Service.yaml

apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Sample commands used:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

kubectl get hpa php-apache --watch

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

Horizontal Pod Autoscaling in Kubernetes