Kubernetes Auto-scaling


A Deep Dive into HPA and VPA
Auto-scaling is one of the most powerful features of Kubernetes, enabling your applications to automatically adjust resources based on demand. In this comprehensive guide, we'll explore two essential auto-scaling mechanisms: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
What is Auto-scaling in Kubernetes?
Auto-scaling in Kubernetes refers to the automatic adjustment of resources allocated to your applications based on current demand. This ensures optimal resource utilization while maintaining application performance and availability.
Horizontal Pod Autoscaler (HPA)
Overview
The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics.
How HPA Works
HPA follows a simple control loop:
Metrics Collection: Gathers metrics from pods every 15 seconds (configurable)
Decision Making: Compares current metrics against target values
Scaling Action: Increases or decreases the number of pod replicas
Cooldown: Waits for a stabilization period before making further changes
Key Features
CPU-based scaling: Default metric for scaling decisions
Memory-based scaling: Scale based on memory utilization
Custom metrics: Use application-specific metrics
External metrics: Scale based on external systems (queue length, database connections)
HPA Configuration Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
Prerequisites for HPA
Metrics Server: Must be installed in the cluster
Resource Requests: Pods must have CPU/memory requests defined
RBAC: Proper permissions for HPA controller
# Example deployment with resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: nginx:1.21
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
Vertical Pod Autoscaler (VPA)
Overview
The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests and limits of containers based on their actual resource usage patterns.
How VPA Works
VPA operates in three modes:
Recommender: Analyzes resource usage and provides recommendations
Updater: Applies recommendations by evicting pods that need updates
Admission Controller: Sets resource requests on new/updated pods
VPA Modes
"Off": Only provides recommendations, no automatic updates
"Initial": Assigns resources when pods are created, no updates to running pods
"Auto": Assigns resources at creation time and updates running pods
"Recreate": Assigns resources at creation time and evicts pods when updates are needed
VPA Configuration Example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 1
memory: 512Mi
controlledResources: ["cpu", "memory"]
VPA Installation
VPA is not installed by default. You need to install it manually:
# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
# Install VPA
./hack/vpa-install.sh
HPA vs VPA: Key Differences
Aspect | HPA | VPA |
Scaling Direction | Horizontal (more pods) | Vertical (bigger pods) |
Resource Adjustment | Number of replicas | CPU/Memory per pod |
Use Case | Handle traffic spikes | Optimize resource allocation |
Pod Disruption | No pod restart needed | May require pod restart |
Maturity | Stable and widely used | Beta, less mature |
Compatibility | Works with stateless apps | Works with both stateless/stateful |
Best Practices
For HPA
Set Appropriate Targets: Don't set CPU targets too low (recommend 70-80%)
Configure Stabilization: Prevent flapping with proper stabilization windows
Monitor Metrics: Ensure metrics server is healthy and collecting data
Test Scaling: Regularly test scaling behavior under load
Use Multiple Metrics: Combine CPU, memory, and custom metrics for better decisions
For VPA
Start with Recommendations: Begin with "Off" mode to understand recommendations
Set Resource Limits: Define min/max boundaries to prevent over-allocation
Consider Pod Disruption: Plan for potential pod restarts in "Auto" mode
Monitor Resource Waste: Use VPA to identify over-provisioned resources
Gradual Rollout: Test VPA on non-critical workloads first
Common Pitfalls and Solutions
HPA Issues
Problem: HPA not scaling Solutions:
Verify metrics server is running
Check resource requests are defined
Ensure target metrics are being collected
Problem: Frequent scaling (flapping) Solutions:
Increase stabilization window
Adjust target utilization thresholds
Use behavior policies to control scaling rate
VPA Issues
Problem: Pods constantly restarting Solutions:
Use "Initial" mode instead of "Auto"
Set appropriate min/max resource limits
Check if resource recommendations are realistic
Problem: VPA recommendations seem incorrect Solutions:
Allow more time for data collection
Verify workload patterns are representative
Check if resource usage spikes are outliers
Monitoring and Observability
Key Metrics to Monitor
HPA Metrics:
Current/target replica count
Scaling events and frequency
Target vs actual resource utilization
VPA Metrics:
Resource recommendations vs actual requests
Pod eviction frequency
Resource utilization efficiency
Monitoring Tools
# Example ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hpa-metrics
spec:
selector:
matchLabels:
app: metrics-server
endpoints:
- port: https
Advanced Scenarios
Combining HPA and VPA
While HPA and VPA can work together, there are important considerations:
Resource Conflicts: Both controllers modify resource specifications
Recommendation: Use HPA for scaling out, VPA for right-sizing during off-peak
Alternative: Use HPA with well-tuned initial resource requests
Custom Metrics with HPA
# Example: Scaling based on queue length
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: "work-queue"
target:
type: AverageValue
averageValue: "10"
Troubleshooting Guide
HPA Troubleshooting Commands
# Check HPA status
kubectl get hpa
# Describe HPA for detailed information
kubectl describe hpa <hpa-name>
# Check HPA controller logs
kubectl logs -n kube-system deployment/metrics-server
# Test metrics availability
kubectl top pods
VPA Troubleshooting Commands
# Check VPA status
kubectl get vpa
# Get VPA recommendations
kubectl describe vpa <vpa-name>
# Check VPA controller logs
kubectl logs -n kube-system deployment/vpa-recommender
Conclusion
HPA and VPA are powerful tools for optimizing resource utilization in Kubernetes clusters. HPA excels at handling traffic variations by scaling the number of pods, while VPA helps right-size individual pods for optimal resource efficiency.
Key Takeaways:
Use HPA for handling variable loads and traffic spikes
Use VPA for optimizing resource allocation and reducing waste
Start with monitoring and recommendations before enabling automatic scaling
Test thoroughly in non-production environments
Monitor scaling behavior and adjust configurations based on observed patterns
By implementing these auto-scaling mechanisms thoughtfully, you can achieve better resource utilization, improved application performance, and reduced operational costs in your Kubernetes clusters.
Remember: Auto-scaling is not a silver bullet. Always monitor your applications and fine-tune your scaling policies based on real-world usage patterns and business requirements.
Subscribe to my newsletter
Read articles from Pratik Raundale directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Pratik Raundale
Pratik Raundale
Cloud DevOps Engineer with hands-on experience in containerization, orchestration, and CI/CD pipelines. Proficient in AWS services, Docker, Kubernetes, and infrastructure automation with expertise in deploying scalable web applications and managing cloud infrastructure