Kubernetes Vertical Pod Autoscaler: A Deep Dive into Right-Sizing Your Applications


Hello All,
Like many of you, I was incredibly excited about the VPA Beta release in Kubernetes 1.33 (which you can read about here). I've since taken the time to understand the current VPA release as of July 2025 in more detail.
Let's dive into VPA in this blog post.
The Critical Question Every DevOps Engineer Asks
I have critical workloads in production that cannot be scaled horizontally, and the replica count is always set to 1. Shall I directly use VPA now to scale vertically?
The answer is NO.
Why? Let's first understand VPA in detail, and then I'll share my experience on why you should be cautious.
What is Vertical Pod Autoscaler?
Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory reservations for your pods to help "right-size" your applications. Unlike Horizontal Pod Autoscaler (HPA), which scales the number of replicas, VPA scales the resources allocated to existing pods.
Think of it this way: HPA says, "I need more workers," while VPA says, "I need stronger workers."
Installation
Installation is straightforward. Refer to the official guide here.
Bash
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh
VPA Operating Modes
VPA can operate in four different modes:
Mode | Description |
Auto | Currently, this means Recreate. This might change to in-place updates in the future. |
Recreate | The VPA assigns resource requests on pod creation and also updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation. |
Initial | The VPA only assigns resource requests on pod creation and never changes them later. |
Off | The VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object. |
Container Resize Policies
With Kubernetes 1.33, you can now define how containers should handle resource changes:
YAML
spec:
containers:
- name: my-app
image: my-app:latest
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired # apply directly to running container
- resourceName: memory
restartPolicy: RestartContainer # apply and restart to take effect
This is a game-changer! Finally, we can control whether a container needs to restart when resources are updated.
Real-World Example
Let me share a practical example. Here's how I set up VPA for a monitoring application:
YAML
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: monitoring-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: monitoring-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: monitoring-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1
memory: 1Gi
controlledResources: ["cpu", "memory"]
Where Can VPA Be Used?
1. Analyzing Current Workload Patterns (Off Mode)
Start with Off mode to understand your application's resource consumption patterns:
YAML
spec:
updatePolicy:
updateMode: "Off"
This mode is perfect for:
Understanding resource usage patterns
Getting recommendations without any changes
Planning capacity for new applications
use prometheus compatible metrics
Limitations and Gotchas
Important: As of Kubernetes 1.33, VPA still has some limitations you need to be aware of:
1. Pod Disruption
In Recreate and Auto modes, VPA will terminate and recreate pods. This means downtime for single-replica applications. This is why I said NO to using it directly on critical production workloads!
2. Compatibility Issues
VPA and HPA cannot target the same metrics (CPU/Memory). You can use them together, but HPA should target custom metrics.
3. Resource Limits
VPA doesn't set resource limits, only requests. You need to set limits manually or use LimitRanges.
My Production Strategy
Here's how I approach VPA in production environments:
Observation (Off Mode)
YAML
updateMode: "Off"
Run for 2-4 weeks to gather data and understand patterns. And Action!
Best Practices
Always set resource bounds:
YAML
minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 2 memory: 4Gi
Use PodDisruptionBudgets:
YAML
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 1 selector: matchLabels: app: my-app
Monitor VPA recommendations:
Bash
kubectl describe vpa my-app-vpa
The Future is Bright
With in-place resource updates coming to Kubernetes, VPA will become much more production-ready. The ability to update resources without pod restarts will be a game-changer for critical workloads.
Until then, use VPA wisely:
Start with Off mode for analysis.
Use Initial mode for new deployments.
Be cautious with Auto mode on critical applications.
Conclusion
VPA is a powerful tool, but like any powerful tool, it needs to be used with care and understanding. Don't rush into production with Auto mode on critical workloads. Take time to understand your application's behavior, set proper bounds, and gradually roll it out.
Remember: With great power comes great responsibility!
Have you tried VPA in your environment? Share your experiences in the comments below.
Happy scaling! ๐
Subscribe to my newsletter
Read articles from Jothimani Radhakrishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Jothimani Radhakrishnan
Jothimani Radhakrishnan
A Software Product Engineer, Cloud enthusiast | Blogger | DevOps | SRE | Python Developer. I usually automate my day-to-day stuff and Blog my experience on challenging items.