Kubernetes Vertical Pod Autoscaler: A Deep Dive into Right-Sizing Your Applications


Hello All,

Like many of you, I was incredibly excited about the VPA Beta release in Kubernetes 1.33 (which you can read about here). I've since taken the time to understand the current VPA release as of July 2025 in more detail.

Let's dive into VPA in this blog post.


The Critical Question Every DevOps Engineer Asks

I have critical workloads in production that cannot be scaled horizontally, and the replica count is always set to 1. Shall I directly use VPA now to scale vertically?

The answer is NO.

Why? Let's first understand VPA in detail, and then I'll share my experience on why you should be cautious.


What is Vertical Pod Autoscaler?

Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory reservations for your pods to help "right-size" your applications. Unlike Horizontal Pod Autoscaler (HPA), which scales the number of replicas, VPA scales the resources allocated to existing pods.

Think of it this way: HPA says, "I need more workers," while VPA says, "I need stronger workers."


Installation

Installation is straightforward. Refer to the official guide here.

Bash

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh

VPA Operating Modes

VPA can operate in four different modes:

ModeDescription
AutoCurrently, this means Recreate. This might change to in-place updates in the future.
RecreateThe VPA assigns resource requests on pod creation and also updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation.
InitialThe VPA only assigns resource requests on pod creation and never changes them later.
OffThe VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Container Resize Policies

With Kubernetes 1.33, you can now define how containers should handle resource changes:

YAML

spec:
  containers:
    - name: my-app
      image: my-app:latest
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired # apply directly to running container
        - resourceName: memory
          restartPolicy: RestartContainer # apply and restart to take effect

This is a game-changer! Finally, we can control whether a container needs to restart when resources are updated.


Real-World Example

Let me share a practical example. Here's how I set up VPA for a monitoring application:

YAML

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: monitoring-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: monitoring-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: monitoring-app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 1
          memory: 1Gi
        controlledResources: ["cpu", "memory"]

Where Can VPA Be Used?

1. Analyzing Current Workload Patterns (Off Mode)

Start with Off mode to understand your application's resource consumption patterns:

YAML

spec:
  updatePolicy:
    updateMode: "Off"

This mode is perfect for:

  • Understanding resource usage patterns

  • Getting recommendations without any changes

  • Planning capacity for new applications

  • use prometheus compatible metrics


Limitations and Gotchas

Important: As of Kubernetes 1.33, VPA still has some limitations you need to be aware of:

1. Pod Disruption

In Recreate and Auto modes, VPA will terminate and recreate pods. This means downtime for single-replica applications. This is why I said NO to using it directly on critical production workloads!

2. Compatibility Issues

VPA and HPA cannot target the same metrics (CPU/Memory). You can use them together, but HPA should target custom metrics.

3. Resource Limits

VPA doesn't set resource limits, only requests. You need to set limits manually or use LimitRanges.


My Production Strategy

Here's how I approach VPA in production environments:

Observation (Off Mode)

YAML

updateMode: "Off"

Run for 2-4 weeks to gather data and understand patterns. And Action!


Best Practices

  1. Always set resource bounds:

    YAML

     minAllowed:
       cpu: 100m
       memory: 128Mi
     maxAllowed:
       cpu: 2
       memory: 4Gi
    
  2. Use PodDisruptionBudgets:

    YAML

     apiVersion: policy/v1
     kind: PodDisruptionBudget
     metadata:
       name: my-app-pdb
     spec:
       minAvailable: 1
       selector:
         matchLabels:
           app: my-app
    
  3. Monitor VPA recommendations:

    Bash

     kubectl describe vpa my-app-vpa
    

The Future is Bright

With in-place resource updates coming to Kubernetes, VPA will become much more production-ready. The ability to update resources without pod restarts will be a game-changer for critical workloads.

Until then, use VPA wisely:

  • Start with Off mode for analysis.

  • Use Initial mode for new deployments.

  • Be cautious with Auto mode on critical applications.


Conclusion

VPA is a powerful tool, but like any powerful tool, it needs to be used with care and understanding. Don't rush into production with Auto mode on critical workloads. Take time to understand your application's behavior, set proper bounds, and gradually roll it out.

Remember: With great power comes great responsibility!

Have you tried VPA in your environment? Share your experiences in the comments below.

Happy scaling! ๐Ÿš€

0
Subscribe to my newsletter

Read articles from Jothimani Radhakrishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jothimani Radhakrishnan
Jothimani Radhakrishnan

A Software Product Engineer, Cloud enthusiast | Blogger | DevOps | SRE | Python Developer. I usually automate my day-to-day stuff and Blog my experience on challenging items.