Kubernetes VPA: Right-Sizing Pod Resources & Production Strategies

Hello All,

Like many of you, I was incredibly excited about the VPA Beta release in Kubernetes 1.33 (which you can read about here). I've since taken the time to understand the current VPA release as of July 2025 in more detail.

Let's dive into VPA in this blog post.

The Critical Question Every DevOps Engineer Asks

I have critical workloads in production that cannot be scaled horizontally, and the replica count is always set to 1. Shall I directly use VPA now to scale vertically?

The answer is NO.

Why? Let's first understand VPA in detail, and then I'll share my experience on why you should be cautious.

What is Vertical Pod Autoscaler?

Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory reservations for your pods to help "right-size" your applications. Unlike Horizontal Pod Autoscaler (HPA), which scales the number of replicas, VPA scales the resources allocated to existing pods.

Think of it this way: HPA says, "I need more workers," while VPA says, "I need stronger workers."

Installation

Installation is straightforward. Refer to the official guide here.

Bash

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh

VPA Operating Modes

VPA can operate in four different modes:

Mode	Description
Auto	Currently, this means Recreate. This might change to in-place updates in the future.
Recreate	The VPA assigns resource requests on pod creation and also updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation.
Initial	The VPA only assigns resource requests on pod creation and never changes them later.
Off	The VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Container Resize Policies

With Kubernetes 1.33, you can now define how containers should handle resource changes:

YAML

spec:
  containers:
    - name: my-app
      image: my-app:latest
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired # apply directly to running container
        - resourceName: memory
          restartPolicy: RestartContainer # apply and restart to take effect

This is a game-changer! Finally, we can control whether a container needs to restart when resources are updated.

Real-World Example

Let me share a practical example. Here's how I set up VPA for a monitoring application:

YAML

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: monitoring-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: monitoring-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: monitoring-app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 1
          memory: 1Gi
        controlledResources: ["cpu", "memory"]

Where Can VPA Be Used?

1. Analyzing Current Workload Patterns (Off Mode)

Start with Off mode to understand your application's resource consumption patterns:

YAML

spec:
  updatePolicy:
    updateMode: "Off"

This mode is perfect for:

Understanding resource usage patterns
Getting recommendations without any changes
Planning capacity for new applications
use prometheus compatible metrics

Limitations and Gotchas

Important: As of Kubernetes 1.33, VPA still has some limitations you need to be aware of:

1. Pod Disruption

In Recreate and Auto modes, VPA will terminate and recreate pods. This means downtime for single-replica applications. This is why I said NO to using it directly on critical production workloads!

2. Compatibility Issues

VPA and HPA cannot target the same metrics (CPU/Memory). You can use them together, but HPA should target custom metrics.

3. Resource Limits

VPA doesn't set resource limits, only requests. You need to set limits manually or use LimitRanges.

My Production Strategy

Here's how I approach VPA in production environments:

Observation (Off Mode)

YAML

updateMode: "Off"

Run for 2-4 weeks to gather data and understand patterns. And Action!

Best Practices

Always set resource bounds:

YAML

 minAllowed:
   cpu: 100m
   memory: 128Mi
 maxAllowed:
   cpu: 2
   memory: 4Gi

Use PodDisruptionBudgets:

YAML

 apiVersion: policy/v1
 kind: PodDisruptionBudget
 metadata:
   name: my-app-pdb
 spec:
   minAvailable: 1
   selector:
     matchLabels:
       app: my-app

Monitor VPA recommendations:

Bash
```
 kubectl describe vpa my-app-vpa
```

The Future is Bright

With in-place resource updates coming to Kubernetes, VPA will become much more production-ready. The ability to update resources without pod restarts will be a game-changer for critical workloads.

Until then, use VPA wisely:

Start with Off mode for analysis.
Use Initial mode for new deployments.
Be cautious with Auto mode on critical applications.

Conclusion

VPA is a powerful tool, but like any powerful tool, it needs to be used with care and understanding. Don't rush into production with Auto mode on critical workloads. Take time to understand your application's behavior, set proper bounds, and gradually roll it out.

Remember: With great power comes great responsibility!

Have you tried VPA in your environment? Share your experiences in the comments below.

Happy scaling! 🚀

Kubernetes Vertical Pod Autoscaler: A Deep Dive into Right-Sizing Your Applications

The Critical Question Every DevOps Engineer Asks

What is Vertical Pod Autoscaler?

Installation

VPA Operating Modes

Container Resize Policies

Real-World Example

Where Can VPA Be Used?

1. Analyzing Current Workload Patterns (Off Mode)

Limitations and Gotchas

1. Pod Disruption

2. Compatibility Issues

3. Resource Limits

My Production Strategy

Observation (Off Mode)

Best Practices

The Future is Bright

Conclusion

Subscribe to my newsletter

Jothimani Radhakrishnan

Jothimani Radhakrishnan