Horizontal Pod Autoscaler using Metrics Server

In my previous blog, I explored the core Kubernetes resources like Deployments, Services, and Namespaces, along with a deep dive into Metrics Server fundamentals. I understood the importance of monitoring resource usage for efficient scaling. Today, I’m moving a step ahead by implementing Horizontal Pod Autoscaling (HPA): a mechanism that automatically scales the number of pods in a deployment based on CPU usage. I’ll walk through installing the Metrics Server, deploying a sample app, exposing it via a service, setting up HPA, and testing it with simulated load.

Step 1: Install Metrics Server

To enable HPA, I first needed the Metrics API running via Metrics Server:

Clone the repo

git clone https://github.com/vaishnaviid/k8s_metrics-server

Navigate into the repo

cd k8s_metrics_server
ls deploy/1.8+/

Apply the manifests

kubectl apply -f deploy/1.8+/

Verify Metrics Server is running

kubectl get all -n kube-system

Test Metrics Server

kubectl top nodes
kubectl top pods

This creates all required resources under the kube-system namespace.

Step 2: Deploy Sample Application

I created a deployment with a container that has CPU resource limits and requests set:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-demo-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      run: hpa-demo-deployment
  template:
    metadata:
      labels:
        run: hpa-demo-deployment
    spec:
      containers:
      - name: hpa-demo-deployment
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 500m

Apply using:

kubectl apply -f deployment.yaml

Step 3: Expose Deployment with a Service

apiVersion: v1
kind: Service
metadata:
  name: hpa-demo-deployment
  labels:
    run: hpa-demo-deployment
spec:
  ports:
  - port: 80
  selector:
    run: hpa-demo-deployment

Apply using:

kubectl apply -f deployment.yaml

Step 4: Create Horizontal Pod Autoscaler

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-demo-deployment
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-demo-deployment
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

Apply using:

kubectl apply -f deployment.yaml

Step 5: Generate Load

To simulate traffic, I used a busybox pod to constantly hit the app:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo-deployment; done"

Then in another terminal:

kubectl get hpa -w
kubectl describe deploy hpa-demo-deployment
kubectl get hpa
kubectl get events
kubectl top pods
kubectl get all

This shows the app scaling up and down based on the CPU threshold of 50%.

Final Thoughts

Today’s session was a solid hands-on with Kubernetes’ native autoscaling using HPA and Metrics Server. It was fascinating to watch pods automatically scale in response to simulated traffic! Setting this up gave me deeper insight into resource management and dynamic scaling — two key elements in building efficient cloud-native applications. Next up, I plan to dive into Cluster Autoscaling and how it works in combination with HPA!

Day 36 of 90 Days of DevOps Challemge: Horizontal Pod Autoscaler with Metrics Server

Step 1: Install Metrics Server

Step 2: Deploy Sample Application

Step 3: Expose Deployment with a Service

Step 4: Create Horizontal Pod Autoscaler

Step 5: Generate Load

Final Thoughts

Subscribe to my newsletter

Vaishnavi D

Vaishnavi D