Day 36 of 90 Days of DevOps Challemge: Horizontal Pod Autoscaler with Metrics Server


In my previous blog, I explored the core Kubernetes resources like Deployments, Services, and Namespaces, along with a deep dive into Metrics Server fundamentals. I understood the importance of monitoring resource usage for efficient scaling. Today, I’m moving a step ahead by implementing Horizontal Pod Autoscaling (HPA): a mechanism that automatically scales the number of pods in a deployment based on CPU usage. I’ll walk through installing the Metrics Server, deploying a sample app, exposing it via a service, setting up HPA, and testing it with simulated load.
Step 1: Install Metrics Server
To enable HPA, I first needed the Metrics API running via Metrics Server:
- Clone the repo
git clone https://github.com/vaishnaviid/k8s_metrics-server
- Navigate into the repo
cd k8s_metrics_server
ls deploy/1.8+/
- Apply the manifests
kubectl apply -f deploy/1.8+/
- Verify Metrics Server is running
kubectl get all -n kube-system
- Test Metrics Server
kubectl top nodes
kubectl top pods
This creates all required resources under the kube-system
namespace.
Step 2: Deploy Sample Application
I created a deployment with a container that has CPU resource limits and requests set:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
spec:
replicas: 1
selector:
matchLabels:
run: hpa-demo-deployment
template:
metadata:
labels:
run: hpa-demo-deployment
spec:
containers:
- name: hpa-demo-deployment
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
limits:
cpu: 500m
Apply using:
kubectl apply -f deployment.yaml
Step 3: Expose Deployment with a Service
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-deployment
labels:
run: hpa-demo-deployment
spec:
ports:
- port: 80
selector:
run: hpa-demo-deployment
Apply using:
kubectl apply -f deployment.yaml
Step 4: Create Horizontal Pod Autoscaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Apply using:
kubectl apply -f deployment.yaml
Step 5: Generate Load
To simulate traffic, I used a busybox pod to constantly hit the app:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo-deployment; done"
Then in another terminal:
kubectl get hpa -w
kubectl describe deploy hpa-demo-deployment
kubectl get hpa
kubectl get events
kubectl top pods
kubectl get all
This shows the app scaling up and down based on the CPU threshold of 50%.
Final Thoughts
Today’s session was a solid hands-on with Kubernetes’ native autoscaling using HPA and Metrics Server. It was fascinating to watch pods automatically scale in response to simulated traffic! Setting this up gave me deeper insight into resource management and dynamic scaling — two key elements in building efficient cloud-native applications. Next up, I plan to dive into Cluster Autoscaling and how it works in combination with HPA!
Subscribe to my newsletter
Read articles from Vaishnavi D directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
