Backup-Induced CPU Spike in Guest VMs: Proxmox

Table of contents

Proxmox VE is a powerful virtualization platform that allows you to easily create and manage virtual machines (VMs). One of the most important aspects of running VMs is backing them up regularly. This ensures that you can restore your VMs in case of a hardware failure or other disaster.
However, backing up VMs can sometimes be a performance-intensive task. This can lead to high CPU usage on the Proxmox VE server, as well as slow VM performance.
High CPU usage on the Proxmox VE server can be due to many factors like no of IO workers running, type of compression being used, speed of the underlying storage disks and many more.
In this blog we are going to investigate high CPU usage in the guest VMs during backups.
Cause
When a backup job for a VM is initiated, QEMU implements a "copy-before-write" mechanism in its block layer. This mechanism ensures that data required for the backup is securely transferred to the backup target before the guest VM overwrites it. To achieve this, the VM's I/O operations are temporarily blocked until the backup process catches up. If the transfer process is slow, the guest VM uses more CPU cycles waiting for I/O operations to finish, resulting in a significant rise in CPU utilization.
In simple words, During the backup transfer process VM I/O is blocked to prevent inconsistencies. If this transfer is slow, I/O operations in the VM will have to wait longer. This block on I/O operations increases the I/O wait time thus increasing the CPU usage.
If you are using Proxmox Backup Server, you can identify the bottleneck using this command.
proxmox-backup-client benchmark --repository <Your repository>
Example Output:
Uploaded 10 chunks in 17 seconds.
Time per request: 1742814 microseconds.
TLS speed: 2.41 MB/s
SHA256 speed: 1831.82 MB/s
Compression speed: 555.06 MB/s
Decompress speed: 664.12 MB/s
AES256/GCM speed: 1487.40 MB/s
Verify speed: 483.77 MB/s
In the above example you can see that TLS speed is the main bottleneck. So, backups are transferring at the slow rate of 2.41 MB/s.
Backup Fleecing
Proxmox VE 8.2 introduced a significant enhancement to its backup process by introducing backup fleecing. This innovative feature aims to mitigate the performance impact of backups on guest VMs, particularly the CPU-intensive "copy-before-write" mechanism.
How Backup Fleecing Works
Traditional Backup: In a traditional backup, QEMU ensures data consistency by transferring modified blocks to the backup target before the guest VM overwrites them. This can lead to significant I/O latency for the guest VM, resulting in increased CPU usage and potential performance degradation.
Backup Fleecing in Action: With fleecing enabled, instead of directly sending modified blocks to the backup target, Proxmox VE temporarily caches them in a "fleecing image" on fast local storage. This local caching significantly reduces I/O latency for the guest VM, as it no longer needs to wait for slower network transfers during the initial write phase.
Benefits of Backup Fleecing:
Improved Guest VM Performance: Reduced I/O latency translates to lower CPU usage within the guest VM, minimizing performance impacts and preventing potential freezes.
Enhanced Backup Reliability: By minimizing I/O contention, fleecing can contribute to more stable and reliable backup operations.
Considerations for Fleecing Storage:
Performance: Choose a fast local storage option with high I/O throughput (e.g., LVM-thin, RBD, ZFS with sparse 1).
Thin Provisioning: Utilizing thin provisioning techniques (like LVM-thin or ZFS with sparse 1) optimizes storage utilization by only allocating space as needed for the fleecing images.
Discard Support: If your storage supports discard operations, enable them to reclaim unused space within the fleecing images.
Enabling Backup Fleecing:
Node-wide Configuration: Edit
/etc/vzdump.conf
and add:fleecing: enabled=true,storage=local-lvm
(Replace
local-lvm
with the name of your chosen fleecing storage.)Job Specific Configuration using Proxmox VE UI: Configure fleecing in the "Advanced" tab when editing a specific backup job.
Conclusion
Backup Fleecing is a valuable feature in Proxmox VE that significantly improves the performance and reliability of backup operations by minimizing the impact on guest VM performance. By strategically implementing fleecing and selecting appropriate storage, you can ensure that your backups complete efficiently while maintaining optimal performance for your virtualized workloads.
Subscribe to my newsletter
Read articles from Aadarsha Dhakal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
