Horizontal Pod Autoscaling in Kubernetes

Sanket NankarSanket Nankar
4 min read

๐Ÿ”„ Horizontal Pod Autoscaling (HPA) in Kubernetes:

URL: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

  • In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.

  • Horizontal scaling means that the response to increased load is to deploy more Pods.

  • This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.

  • If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet or other similar resource) to scale back down.

  • Horizontal pod autoscaling does not apply to objects that can't be scaled (for example: a DaemonSet.)

  • The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller.

  • The horizontal pod autoscaling controller, running within the Kubernetes control plane, periodically adjusts the desired scale of its target (for example, a Deployment) to match observed metrics such as average CPU utilization, average memory utilization or any other custom metric you specify.

  • The Horizontal Pod Autoscaler is an API resource in the Kubernetes autoscaling API group.

โœ… What is HPA?

Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment, replica set, or statefulset based on observed CPU utilization (or custom metrics).

HPA helps apps scale out when under high load and scale in when under low usage, improving performance and resource utilization.

๐Ÿง  How It Works

  • The HPA controller runs in the Kubernetes control plane.

  • It periodically checks the metrics (CPU or others) from the metrics server.

  • Based on the usage, it increases or decreases the number of pods between a configured min and max limit.

๐Ÿ“Š Default Metric Used

By default, HPA uses:

  • CPU utilization (percentage)

But it can also use:

  • Memory usage

  • Custom or external metrics

โš™๏ธ Pre-requisite

Make sure the Metrics Server is installed in the cluster:

kubectl get deployment metrics-server -n kube-system

๐Ÿ› ๏ธ Create HPA Example

kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

๐Ÿ“Œ This creates an HPA that:

  • Targets my-app deployment

  • Maintains average CPU at 50%

  • Scales between 2 and 10 pods

๐Ÿ“‹ YAML Example

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

Create with:

kubectl apply -f hpa.yaml

๐Ÿ“œ List of HPA Commands

TaskCommand
๐Ÿ” Get all HPAskubectl get hpa
๐Ÿ” Get HPA with detailskubectl describe hpa <hpa-name>
โœ๏ธ Edit HPAkubectl edit hpa <hpa-name>
โŒ Delete HPAkubectl delete hpa <hpa-name>
๐Ÿ“Š View metricskubectl top pods and kubectl top nodes
โœ… Create HPAkubectl autoscale deployment <name> --cpu-percent=<value> --min=<min> --max=<max>

๐Ÿ”„ How Scaling Happens

Imagine CPU goes beyond 50% threshold:

  • HPA controller detects this

  • It increases pods from, say, 2 to 4

  • When CPU drops below 50%, it scales back down

Install Metrics Server

  kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Get Deployment

  kubectl get deployment metrics-server -n kube-system

Edit the Metrics Server Deployment

  kubectl edit deployment metrics-server -n kube-system

Add the security bypass to deployment under container.args

  - --kubelet-insecure-tls
  - --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP

Restart the deployment

  kubectl rollout restart deployment metrics-server -n kube-system

Lab Practice:

Sample YAML

  1. Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
  1. Service.yaml
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache
  1. hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Sample commands used:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

kubectl get hpa php-apache --watch

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
0
Subscribe to my newsletter

Read articles from Sanket Nankar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sanket Nankar
Sanket Nankar