A Deep Dive into HPA and VPA

Auto-scaling is one of the most powerful features of Kubernetes, enabling your applications to automatically adjust resources based on demand. In this comprehensive guide, we'll explore two essential auto-scaling mechanisms: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).

What is Auto-scaling in Kubernetes?

Auto-scaling in Kubernetes refers to the automatic adjustment of resources allocated to your applications based on current demand. This ensures optimal resource utilization while maintaining application performance and availability.

Horizontal Pod Autoscaler (HPA)

Overview

The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics.

How HPA Works

HPA follows a simple control loop:

Metrics Collection: Gathers metrics from pods every 15 seconds (configurable)
Decision Making: Compares current metrics against target values
Scaling Action: Increases or decreases the number of pod replicas
Cooldown: Waits for a stabilization period before making further changes

Key Features

CPU-based scaling: Default metric for scaling decisions
Memory-based scaling: Scale based on memory utilization
Custom metrics: Use application-specific metrics
External metrics: Scale based on external systems (queue length, database connections)

HPA Configuration Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Prerequisites for HPA

Metrics Server: Must be installed in the cluster
Resource Requests: Pods must have CPU/memory requests defined
RBAC: Proper permissions for HPA controller

# Example deployment with resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx:1.21
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi

Vertical Pod Autoscaler (VPA)

Overview

The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of containers based on their actual resource usage patterns.

How VPA Works

VPA operates in three modes:

Recommender: Analyzes resource usage and provides recommendations
Updater: Applies recommendations by evicting pods that need updates
Admission Controller: Sets resource requests on new/updated pods

VPA Modes

"Off": Only provides recommendations, no automatic updates
"Initial": Assigns resources when pods are created, no updates to running pods
"Auto": Assigns resources at creation time and updates running pods
"Recreate": Assigns resources at creation time and evicts pods when updates are needed

VPA Configuration Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web-app
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi
      controlledResources: ["cpu", "memory"]

VPA Installation

VPA is not installed by default. You need to install it manually:

# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/

# Install VPA
./hack/vpa-install.sh

HPA vs VPA: Key Differences

Aspect	HPA	VPA
Scaling Direction	Horizontal (more pods)	Vertical (bigger pods)
Resource Adjustment	Number of replicas	CPU/Memory per pod
Use Case	Handle traffic spikes	Optimize resource allocation
Pod Disruption	No pod restart needed	May require pod restart
Maturity	Stable and widely used	Beta, less mature
Compatibility	Works with stateless apps	Works with both stateless/stateful

Best Practices

For HPA

Set Appropriate Targets: Don't set CPU targets too low (recommend 70-80%)
Configure Stabilization: Prevent flapping with proper stabilization windows
Monitor Metrics: Ensure metrics server is healthy and collecting data
Test Scaling: Regularly test scaling behavior under load
Use Multiple Metrics: Combine CPU, memory, and custom metrics for better decisions

For VPA

Start with Recommendations: Begin with "Off" mode to understand recommendations
Set Resource Limits: Define min/max boundaries to prevent over-allocation
Consider Pod Disruption: Plan for potential pod restarts in "Auto" mode
Monitor Resource Waste: Use VPA to identify over-provisioned resources
Gradual Rollout: Test VPA on non-critical workloads first

Common Pitfalls and Solutions

HPA Issues

Problem: HPA not scaling Solutions:

Verify metrics server is running
Check resource requests are defined
Ensure target metrics are being collected

Problem: Frequent scaling (flapping) Solutions:

Increase stabilization window
Adjust target utilization thresholds
Use behavior policies to control scaling rate

VPA Issues

Problem: Pods constantly restarting Solutions:

Use "Initial" mode instead of "Auto"
Set appropriate min/max resource limits
Check if resource recommendations are realistic

Problem: VPA recommendations seem incorrect Solutions:

Allow more time for data collection
Verify workload patterns are representative
Check if resource usage spikes are outliers

Monitoring and Observability

Key Metrics to Monitor

HPA Metrics:
- Current/target replica count
- Scaling events and frequency
- Target vs actual resource utilization
VPA Metrics:
- Resource recommendations vs actual requests
- Pod eviction frequency
- Resource utilization efficiency

Monitoring Tools

# Example ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hpa-metrics
spec:
  selector:
    matchLabels:
      app: metrics-server
  endpoints:
  - port: https

Advanced Scenarios

Combining HPA and VPA

While HPA and VPA can work together, there are important considerations:

Resource Conflicts: Both controllers modify resource specifications
Recommendation: Use HPA for scaling out, VPA for right-sizing during off-peak
Alternative: Use HPA with well-tuned initial resource requests

Custom Metrics with HPA

# Example: Scaling based on queue length
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: "work-queue"
      target:
        type: AverageValue
        averageValue: "10"

Troubleshooting Guide

HPA Troubleshooting Commands

# Check HPA status
kubectl get hpa

# Describe HPA for detailed information
kubectl describe hpa <hpa-name>

# Check HPA controller logs
kubectl logs -n kube-system deployment/metrics-server

# Test metrics availability
kubectl top pods

VPA Troubleshooting Commands

# Check VPA status
kubectl get vpa

# Get VPA recommendations
kubectl describe vpa <vpa-name>

# Check VPA controller logs
kubectl logs -n kube-system deployment/vpa-recommender

Conclusion

HPA and VPA are powerful tools for optimizing resource utilization in Kubernetes clusters. HPA excels at handling traffic variations by scaling the number of pods, while VPA helps right-size individual pods for optimal resource efficiency.

Key Takeaways:

Use HPA for handling variable loads and traffic spikes
Use VPA for optimizing resource allocation and reducing waste
Start with monitoring and recommendations before enabling automatic scaling
Test thoroughly in non-production environments
Monitor scaling behavior and adjust configurations based on observed patterns

By implementing these auto-scaling mechanisms thoughtfully, you can achieve better resource utilization, improved application performance, and reduced operational costs in your Kubernetes clusters.

Remember: Auto-scaling is not a silver bullet. Always monitor your applications and fine-tune your scaling policies based on real-world usage patterns and business requirements.

Kubernetes Auto-scaling

A Deep Dive into HPA and VPA

What is Auto-scaling in Kubernetes?

Horizontal Pod Autoscaler (HPA)

Overview

How HPA Works

Key Features

HPA Configuration Example

Prerequisites for HPA

Vertical Pod Autoscaler (VPA)

Overview

How VPA Works

VPA Modes

VPA Configuration Example

VPA Installation

HPA vs VPA: Key Differences

Best Practices

For HPA

For VPA

Common Pitfalls and Solutions

HPA Issues

VPA Issues

Monitoring and Observability

Key Metrics to Monitor

Monitoring Tools

Advanced Scenarios

Combining HPA and VPA

Custom Metrics with HPA

Troubleshooting Guide

HPA Troubleshooting Commands

VPA Troubleshooting Commands

Conclusion

Subscribe to my newsletter

Pratik Raundale

Pratik Raundale