Resource Contention in vSphere : Identification and Solutions

VMware vSphere remains the platform of choice for many organizations seeking flexibility, scalability, and performance. However, as VM density rises and workloads become more varied, performance bottlenecks can surface. To maintain a healthy and responsive environment, we administrators must understand the key performance metrics & the story they tell.
In this blog , we'll try to dive deep into some of the critical performance metrics that affect the performance of VMs in a vSphere environment and explore the symptoms of contention.
CPU Ready (
%RDY
)CPU Wait (
%WAIT
and%VMWAIT
)CPU Co-Stop (
%CSTP
)Memory Ballooning
CPU Contention
Rethinking the vCPU to pCPU Ratio
Before we dive into specific metrics like %RDY
or %CSTP
, we must address one of the most fundamental questions in virtualization: What is the right vCPU to pCPU ratio?
For years, we administrators relied on general rules of thumb like 4:1 or even 10:1. These static ratios, however, were born in an era when many virtual workloads were largely idle. In such an environment, over-committing physical CPUs made sense. In today's world of resource-intensive applications & dynamic workloads, such a fixed ratio can lead to performance bottlenecks and unhappy users.
Drive by Contention
Instead of focusing on a static ratio, the modern approach is to "drive by contention. This means -
Actively monitor your environment for signs of CPU stress (like high Ready and Co-Stop times, which we'll cover next).
Expand your resource pools or adjust VM sizing based on real-world data.
This approach ensures that your applications have the resources they need, when they need them, without being constrained by an arbitrary ratio.
A conservative safe starting point is a 1:1 vCPU to pCPU ratio (not counting hyper-threading) which is most predictable.
This eliminates the risk of contention by dedicating a physical core to every virtual CPU. As your monitoring and operational processes mature, you can cautiously oversubscribe based on observed performance.
Ultimately, the optimal ratio is unique to your environmnet’s specific workloads, hardware, and based on your specific needs and observations.
CPU Metrics
CPU Ready (
%RDY
)
What Is CPU Ready?
CPU Ready measures the percentage of time a virtual CPU (vCPU) is ready to execute instructions but must wait in a queue for a physical CPU core to become available. In a heavily contended environment, meaning when the vcpu to pcpu ratio exceeds the prescribed vale from vmware vCPUs may wait longer before being scheduled, resulting in application slowdowns.
Impact on Performance
Increased Latency: Applications experience higher response times due to these micro-pauses in scheduling.
Reduced Throughput: The overall work processed per unit of time drops, affecting both batch and transactional workloads.
How to Identify CPU Ready
vSphere Client (vCenter)
Monitor the "Ready" & the “Readiness“ metric under each VM's CPU performance chart.
A good rule of thumb is to investigate when the ready time consistently exceeds
5%
per vCPUFor example, a 4-vCPU VM could tolerate up to
20%
total ready time before showing significant degradation.
esxtop: In the ESXi shell, run
esxtop
and pressc
for the CPU view. Observe the%RDY
column for each VM.VMware Aria Operations
Best Practices to Reduce CPU Ready
Right-Size vCPU Count: This is the most effective solution. Avoid over-provisioning vCPUs. Assign the minimum number required by the workload inside the guest OS.
Use Affinity Rules Sparingly: CPU affinity rules restrict the scheduler's flexibility, which can increase ready time. Use them only for specific, well-understood licensing or application requirements.
Resource Pools and Shares: Allocate CPU shares, reservations, and limits thoughtfully to prioritize critical VMs and prevent "noisy neighbors" from consuming all available resources.
Cluster Sizing: Ensure your cluster has enough physical cores to support the peak requirements of its running VMs.
CPU Wait Time (
%WAIT
&%VMWAIT
)
What Is CPU Wait Time?
This is one of the most misunderstood metrics. CPU Wait (%WAIT in esxtop) measures the time a vCPU is in a stopped state, waiting for an event. A high %WAIT value is not always a problem. It is composed of two key metrics:
Idle Time (
%IDLE
): Time the guest OS intentionally put the vCPU in a halt state because it had no work to do. This is normal and expected for a non-busy VM.VMWait Time (
%VMWAIT
): Time the vCPU was forced to wait for a hypervisor event to complete, most commonly a storage I/O or network I/O operation. This is the metric that indicates a potential problem.
The formula is simple: %WAIT = %IDLE + %VMWAIT
.
Impact on Performance
A high
%WAIT
driven by high%IDLE
has no negative performance impact; it simply means the VM is idle.A high
%WAIT
driven by high%VMWAIT
indicates a genuine infrastructure bottleneck, causing application stalls, slow data access, and poor user experience.
How to Identify Real CPU Wait Issues
In the ESXi shell, run
esxtop
and pressc
for the CPU view.Observe the
%WAIT
column. If it's high, proceed to the next step.Press
f
to change fields, navigate to theVMWAIT
metric, and press the spacebar to add it to the view.Analyze the results: If
%VMWAIT
is high, you have confirmed a bottleneck, likely related to storage or network latency. If%VMWAIT
is low, the VM is simply idle, and no action is needed.
Best Practices to Reduce High %VMWAIT
Optimize Storage Paths: Ensure multipathing is correctly configured and all paths are active.
Upgrade Storage Tiers: Move latency-sensitive workloads to faster storage (e.g., NVMe, SSD-backed datastores).
Check Network Latency: Investigate network device performance if storage appears healthy.
Adjust Queue Depths: Tune HBA and storage array queue depths to handle your workload's I/O profile.
CPU Co-Stop (
%CSTP
)
What Is CPU Co-Stop?
CPU Co-Stop (%CSTP in esxtop) measures the time a vCPU is forcibly stopped by the hypervisor to allow its sibling vCPUs within the same VM to catch up. This occurs in Symmetric Multi-Processing (SMP) VMs when the hypervisor cannot schedule all of the VM's vCPUs on physical cores simultaneously. It is a direct symptom of CPU over-contention, especially for "wide" VMs (those with many vCPUs).
Impact on Performance
Synchronization Overhead: Multi-threaded applications suffer from added latency as some threads are paused, waiting for others.
Unpredictable Performance: Co-stop spikes lead to performance "jitter" in CPU-intensive workloads.
How to Identify CPU Co-Stop
esxtop: In the CPU view (
c
), pressf
to change fields and add the%CSTP
column. Any value consistently above3%
is a cause for concern.vRealize Operations: Advanced analytics can track and alert on
%CSTP
anomalies over time.
Best Practices to Mitigate CPU Co-Stop
Minimize vCPU Count: The primary solution is to right-size VMs with the fewest vCPUs they truly need. A 2 vCPU VM is far less likely to experience co-stop than an 8-vCPU VM.
NUMA Awareness: Align VM vCPU and memory sizing with the host's physical NUMA topology to avoid performance penalties from cross-node memory access.
Avoid Heavy Over subscription: Keep the overall vCPU-to-physical-core ratio on the host within reasonable bounds (e.g., a 4:1 ratio is a common starting point, but this depends heavily on the workload).
Memory Metrics
Memory Ballooning
What Is Memory Ballooning?
Memory ballooning is a memory reclamation technique used by the ESXi hypervisor when a host is under memory pressure. A balloon driver (vmmemctl) inside the guest OS "inflates" by requesting memory from the guest. This forces the guest OS to use its own memory management (e.g., its page/swap file) to free up pages, which the hypervisor can then reclaim and allocate to another VM.
Impact on Performance
Guest-Level Paging: When ballooning is active, the guest OS is forced to swap memory to its own virtual disk. This disk I/O is thousands of times slower than accessing RAM, severely degrading application performance.
Increased Disk I/O: Guest OS swap activity generates additional storage load, which can compound existing I/O bottlenecks.
How to Identify Ballooning
vSphere Client: In the VM’s "Memory" performance chart, monitor the “Ballooned memory” metric. Any sustained non-zero value indicates the host is or was recently under memory pressure.
esxtop: In
esxtop
, pressm
for memory view. Check theMCTLSZ
column for the amount of memory being reclaimed by the balloon driver.
Best Practices to Minimize Ballooning
Right-Size VM Memory: Allocate only the memory the application truly needs. Over-allocating RAM to idle VMs "traps" that memory, making it unavailable to other VMs.
Monitor Host Memory Usage: Ensure hosts have sufficient free memory to avoid contention. Use vSphere DRS to balance memory load across a cluster.
Use Reservations for Critical VMs: If a VM must never have its memory reclaimed, set a memory reservation.
Leverage vSphere Host Cache: Configure swap-to-host-cache on a fast SSD to mitigate the performance impact when host-level swapping is unavoidable.
Conclusion
Effective VMware vSphere performance tuning depends on a deep understanding of these metrics.
By correctly interpreting CPU Ready, Wait, and Co-Stop, we can distinguish between an idle VM and one genuinely struggling with contention.
Right-sizing resources, optimizing infrastructure, and continuous monitoring are key to a high-performing virtual environment.
Implement continuous performance monitoring using vRealize Operations or native vCenter dashboards.
Subscribe to my newsletter
Read articles from Sumit Sur directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
