Horizontal Pod Autoscaling in Kubernetes

๐ Horizontal Pod Autoscaling (HPA) in Kubernetes:
URL: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more Pods.
This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.
If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet or other similar resource) to scale back down.
Horizontal pod autoscaling does not apply to objects that can't be scaled (for example: a DaemonSet.)
The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller.
The horizontal pod autoscaling controller, running within the Kubernetes control plane, periodically adjusts the desired scale of its target (for example, a Deployment) to match observed metrics such as average CPU utilization, average memory utilization or any other custom metric you specify.
The Horizontal Pod Autoscaler is an API resource in the Kubernetes autoscaling API group.
โ What is HPA?
Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pods in a deployment, replica set, or statefulset based on observed CPU utilization (or custom metrics).
HPA helps apps scale out when under high load and scale in when under low usage, improving performance and resource utilization.
๐ง How It Works
The HPA controller runs in the Kubernetes control plane.
It periodically checks the metrics (CPU or others) from the metrics server.
Based on the usage, it increases or decreases the number of pods between a configured min and max limit.
๐ Default Metric Used
By default, HPA uses:
CPU utilization
(percentage)
But it can also use:
Memory usage
Custom or external metrics
โ๏ธ Pre-requisite
Make sure the Metrics Server is installed in the cluster:
kubectl get deployment metrics-server -n kube-system
๐ ๏ธ Create HPA Example
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
๐ This creates an HPA that:
Targets
my-app
deploymentMaintains average CPU at 50%
Scales between 2 and 10 pods
๐ YAML Example
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Create with:
kubectl apply -f hpa.yaml
๐ List of HPA Commands
Task | Command |
๐ Get all HPAs | kubectl get hpa |
๐ Get HPA with details | kubectl describe hpa <hpa-name> |
โ๏ธ Edit HPA | kubectl edit hpa <hpa-name> |
โ Delete HPA | kubectl delete hpa <hpa-name> |
๐ View metrics | kubectl top pods and kubectl top nodes |
โ Create HPA | kubectl autoscale deployment <name> --cpu-percent=<value> --min=<min> --max=<max> |
๐ How Scaling Happens
Imagine CPU goes beyond 50% threshold:
HPA controller detects this
It increases pods from, say, 2 to 4
When CPU drops below 50%, it scales back down
Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Get Deployment
kubectl get deployment metrics-server -n kube-system
Edit the Metrics Server Deployment
kubectl edit deployment metrics-server -n kube-system
Add the security bypass to deployment under container.args
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
Restart the deployment
kubectl rollout restart deployment metrics-server -n kube-system
Lab Practice:
Sample YAML
- Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
- Service.yaml
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
- hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Sample commands used:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
kubectl get hpa php-apache --watch
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Subscribe to my newsletter
Read articles from Sanket Nankar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
