Kubernetes Auto-scaling

Pratik RaundalePratik Raundale
6 min read

A Deep Dive into HPA and VPA

Auto-scaling is one of the most powerful features of Kubernetes, enabling your applications to automatically adjust resources based on demand. In this comprehensive guide, we'll explore two essential auto-scaling mechanisms: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).

What is Auto-scaling in Kubernetes?

Auto-scaling in Kubernetes refers to the automatic adjustment of resources allocated to your applications based on current demand. This ensures optimal resource utilization while maintaining application performance and availability.

Horizontal Pod Autoscaler (HPA)

Overview

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics.

How HPA Works

HPA follows a simple control loop:

  1. Metrics Collection: Gathers metrics from pods every 15 seconds (configurable)

  2. Decision Making: Compares current metrics against target values

  3. Scaling Action: Increases or decreases the number of pod replicas

  4. Cooldown: Waits for a stabilization period before making further changes

Key Features

  • CPU-based scaling: Default metric for scaling decisions

  • Memory-based scaling: Scale based on memory utilization

  • Custom metrics: Use application-specific metrics

  • External metrics: Scale based on external systems (queue length, database connections)

HPA Configuration Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Prerequisites for HPA

  1. Metrics Server: Must be installed in the cluster

  2. Resource Requests: Pods must have CPU/memory requests defined

  3. RBAC: Proper permissions for HPA controller

# Example deployment with resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx:1.21
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi

Vertical Pod Autoscaler (VPA)

Overview

The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of containers based on their actual resource usage patterns.

How VPA Works

VPA operates in three modes:

  1. Recommender: Analyzes resource usage and provides recommendations

  2. Updater: Applies recommendations by evicting pods that need updates

  3. Admission Controller: Sets resource requests on new/updated pods

VPA Modes

  • "Off": Only provides recommendations, no automatic updates

  • "Initial": Assigns resources when pods are created, no updates to running pods

  • "Auto": Assigns resources at creation time and updates running pods

  • "Recreate": Assigns resources at creation time and evicts pods when updates are needed

VPA Configuration Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi
      controlledResources: ["cpu", "memory"]

VPA Installation

VPA is not installed by default. You need to install it manually:

# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

# Install VPA
./hack/vpa-install.sh

HPA vs VPA: Key Differences

AspectHPAVPA
Scaling DirectionHorizontal (more pods)Vertical (bigger pods)
Resource AdjustmentNumber of replicasCPU/Memory per pod
Use CaseHandle traffic spikesOptimize resource allocation
Pod DisruptionNo pod restart neededMay require pod restart
MaturityStable and widely usedBeta, less mature
CompatibilityWorks with stateless appsWorks with both stateless/stateful

Best Practices

For HPA

  1. Set Appropriate Targets: Don't set CPU targets too low (recommend 70-80%)

  2. Configure Stabilization: Prevent flapping with proper stabilization windows

  3. Monitor Metrics: Ensure metrics server is healthy and collecting data

  4. Test Scaling: Regularly test scaling behavior under load

  5. Use Multiple Metrics: Combine CPU, memory, and custom metrics for better decisions

For VPA

  1. Start with Recommendations: Begin with "Off" mode to understand recommendations

  2. Set Resource Limits: Define min/max boundaries to prevent over-allocation

  3. Consider Pod Disruption: Plan for potential pod restarts in "Auto" mode

  4. Monitor Resource Waste: Use VPA to identify over-provisioned resources

  5. Gradual Rollout: Test VPA on non-critical workloads first

Common Pitfalls and Solutions

HPA Issues

Problem: HPA not scaling Solutions:

  • Verify metrics server is running

  • Check resource requests are defined

  • Ensure target metrics are being collected

Problem: Frequent scaling (flapping) Solutions:

  • Increase stabilization window

  • Adjust target utilization thresholds

  • Use behavior policies to control scaling rate

VPA Issues

Problem: Pods constantly restarting Solutions:

  • Use "Initial" mode instead of "Auto"

  • Set appropriate min/max resource limits

  • Check if resource recommendations are realistic

Problem: VPA recommendations seem incorrect Solutions:

  • Allow more time for data collection

  • Verify workload patterns are representative

  • Check if resource usage spikes are outliers

Monitoring and Observability

Key Metrics to Monitor

  1. HPA Metrics:

    • Current/target replica count

    • Scaling events and frequency

    • Target vs actual resource utilization

  2. VPA Metrics:

    • Resource recommendations vs actual requests

    • Pod eviction frequency

    • Resource utilization efficiency

Monitoring Tools

# Example ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hpa-metrics
spec:
  selector:
    matchLabels:
      app: metrics-server
  endpoints:
  - port: https

Advanced Scenarios

Combining HPA and VPA

While HPA and VPA can work together, there are important considerations:

  • Resource Conflicts: Both controllers modify resource specifications

  • Recommendation: Use HPA for scaling out, VPA for right-sizing during off-peak

  • Alternative: Use HPA with well-tuned initial resource requests

Custom Metrics with HPA

# Example: Scaling based on queue length
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: "work-queue"
      target:
        type: AverageValue
        averageValue: "10"

Troubleshooting Guide

HPA Troubleshooting Commands

# Check HPA status
kubectl get hpa

# Describe HPA for detailed information
kubectl describe hpa <hpa-name>

# Check HPA controller logs
kubectl logs -n kube-system deployment/metrics-server

# Test metrics availability
kubectl top pods

VPA Troubleshooting Commands

# Check VPA status
kubectl get vpa

# Get VPA recommendations
kubectl describe vpa <vpa-name>

# Check VPA controller logs
kubectl logs -n kube-system deployment/vpa-recommender

Conclusion

HPA and VPA are powerful tools for optimizing resource utilization in Kubernetes clusters. HPA excels at handling traffic variations by scaling the number of pods, while VPA helps right-size individual pods for optimal resource efficiency.

Key Takeaways:

  • Use HPA for handling variable loads and traffic spikes

  • Use VPA for optimizing resource allocation and reducing waste

  • Start with monitoring and recommendations before enabling automatic scaling

  • Test thoroughly in non-production environments

  • Monitor scaling behavior and adjust configurations based on observed patterns

By implementing these auto-scaling mechanisms thoughtfully, you can achieve better resource utilization, improved application performance, and reduced operational costs in your Kubernetes clusters.


Remember: Auto-scaling is not a silver bullet. Always monitor your applications and fine-tune your scaling policies based on real-world usage patterns and business requirements.

1
Subscribe to my newsletter

Read articles from Pratik Raundale directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pratik Raundale
Pratik Raundale

Cloud DevOps Engineer with hands-on experience in containerization, orchestration, and CI/CD pipelines. Proficient in AWS services, Docker, Kubernetes, and infrastructure automation with expertise in deploying scalable web applications and managing cloud infrastructure