Achieving Zero-Downtime Live Migration for Stateful Workloads with Drafter: An In-Depth Guide

Table of contents

The Challenge with Stateful Workloads in Cloud
In today's cloud-native world, managing stateful workloads like databases presents unique challenges. While solutions like Kubernetes excel at handling stateless applications, managing stateful services—especially when it comes to live migration between nodes or even different cloud providers—remains complex.
One particular pain point is running databases on spot instances. While spot instances offer significant cost savings (often 60-90% cheaper than regular instances), their temporary nature poses a major challenge for stateful applications. When a cloud provider reclaims a spot instance, you typically have just minutes to migrate your stateful workload without losing data or experiencing downtime.
Enter Drafter: A New Approach to VM Migration
Drafter, an open-source project by Loophole Labs, offers a novel solution to this challenge. It's a compute primitive that enables live migration of virtual machines between heterogeneous nodes with effectively zero downtime. Think of it as a way to "lift and shift" running applications—including their entire state—between different machines or even different cloud providers.
Key Features That Make Drafter Special
Fast Migration: Drafter achieves migrations in under 100ms within the same data center and around 500ms for cross-continental migrations
Heterogeneous Support: Works across different cloud providers and CPU models (with PVM support)
No Hardware Virtualization Requirement: Using PVM (Platform Virtualization Module), Drafter can run on instances without hardware virtualization support
OCI Image Support: Can run container images directly as VMs without nested virtualization overhead
A Practical Example: Migrating Redis Between GCP Instances
Let's walk through a real-world example of using Drafter to migrate a Redis instance between two Google Cloud Platform (GCP) virtual machines with zero downtime.
The Setup
Creating GCP Instances in Different Regions
First, let's create two instances in different regions: one in us-east1 (South Carolina) and another in europe-west2 (London).
Through GCP Console
Source Instance (US East):
Go to Google Cloud Console → Compute Engine → VM Instances
Click "CREATE INSTANCE"
Configure:
Name: redis-source Region: us-east1 (South Carolina) Zone: us-east1-b Machine configuration: c2-standard-4 (4 vCPU, 16 GB memory) Boot disk: - Operating System: Rocky Linux - Version: Rocky Linux 9 - Size: 20 GB
Destination Instance (London):
Same process but with:
Name: redis-destination Region: europe-west2 (London) Zone: europe-west2-a
Keep all other settings identical to ensure compatibility
Networking:
Create a Firewall rule for migration:
Name: allow-drafter-migration Network: default Targets: Specified target tags Target tags: drafter-migration Source IP ranges: [Source instance IP]/32 Protocols and ports: tcp:1337
Add network tags to both instances:
Edit each instance
Add network tag:
drafter-migration
Using gcloud CLI
For those who prefer command line:
# Create source instance in US East
gcloud compute instances create redis-source \
--project=your-project-id \
--zone=us-east1-b \
--machine-type=c2-standard-4 \
--network-interface=network-tier=PREMIUM,subnet=default \
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--service-account=your-service-account \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--tags=drafter-migration \
--create-disk=auto-delete=yes,boot=yes,device-name=redis-source,image=rocky-linux-9-optimized-gcp,mode=rw,size=20,type=pd-balanced \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any
# Create destination instance in London
gcloud compute instances create redis-destination \
--project=your-project-id \
--zone=europe-west2-a \
--machine-type=c2-standard-4 \
--network-interface=network-tier=PREMIUM,subnet=default \
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--service-account=your-service-account \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--tags=drafter-migration \
--create-disk=auto-delete=yes,boot=yes,device-name=redis-destination,image=rocky-linux-9-optimized-gcp,mode=rw,size=20,type=pd-balanced \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any
# Create firewall rule for migration
gcloud compute firewall-rules create allow-drafter-migration \
--direction=INGRESS \
--priority=1000 \
--network=default \
--action=ALLOW \
--rules=tcp:1337 \
--source-tags=drafter-migration \
--target-tags=drafter-migration
Verifying Connectivity
After creating the instances:
SSH into source instance:
gcloud compute ssh redis-source --zone=us-east1-b
Test connectivity to destination:
# From source instance ping $(gcloud compute instances describe redis-destination \ --zone=europe-west2-a \ --format='get(networkInterfaces[0].networkIP)')
Check latency:
# This will help you estimate migration time time nc -zv $(gcloud compute instances describe redis-destination \ --zone=europe-west2-a \ --format='get(networkInterfaces[0].networkIP)') 1337
Now we have two GCP instances running Rocky Linux 9:
Both instances use PVM for cross-machine compatibility and Drafter for orchestrating the migration.
Step 1: Setting Up PVM Support
First, we need to prepare our instances with PVM support. On both instances:
# Update system packages
sudo dnf update -y
# Add the PVM repository for Rocky Linux 9 on GCP
sudo dnf config-manager --add-repo 'https://loopholelabs.github.io/linux-pvm-ci/rocky/gcp/repodata/linux-pvm-ci.repo'
# Install the PVM kernel package
sudo dnf install -y kernel-6.7.12_pvm_host_rocky_gcp-1.x86_64
# Set the PVM kernel as default and configure kernel parameters
sudo grubby --set-default /boot/vmlinuz-6.7.12-pvm-host-rocky-gcp
sudo grubby --copy-default --args="pti=off nokaslr lapic=notscdeadline" --update-kernel /boot/vmlinuz-6.7.12-pvm-host-rocky-gcp
# Regenerate initramfs
sudo dracut --force --kver 6.7.12-pvm-host-rocky-gcp
# Configure kernel module blacklisting
sudo tee /etc/modprobe.d/kvm-intel-amd-blacklist.conf <<EOF
blacklist kvm-intel
blacklist kvm-amd
EOF
# Enable PVM module
echo "kvm-pvm" | sudo tee /etc/modules-load.d/kvm-pvm.conf
# Reboot the instance
sudo reboot
#After reboot, verify PVM installation:
# Check if PVM module is loaded
lsmod | grep pvm
# Expected output should show:
# kvm_pvm 57344 0
# kvm 1388544 1 kvm_pvm
```
Step 2: Installing Drafter Components
Drafter consists of several components that work together:
mkdir -p /tmp/drafter-install
# Download and install Drafter binaries
for BINARY in drafter-nat drafter-forwarder drafter-snapshotter drafter-packager drafter-runner drafter-registry drafter-mounter drafter-peer drafter-terminator; do
echo "Downloading $BINARY..."
curl -L -o "/tmp/drafter-install/${BINARY}" "https://github.com/loopholelabs/drafter/releases/latest/download/${BINARY}.linux-$(uname -m)"
echo "Installing $BINARY..."
sudo install -v "/tmp/drafter-install/${BINARY}" /usr/local/bin
done
# Download and install Firecracker with PVM support
for BINARY in firecracker jailer; do
echo "Downloading $BINARY..."
curl -L -o "/tmp/drafter-install/${BINARY}" "https://github.com/loopholelabs/firecracker/releases/download/release-main-live-migration-pvm/${BINARY}.linux-$(uname -m)"
echo "Installing $BINARY..."
sudo install -v "/tmp/drafter-install/${BINARY}" /usr/local/bin
done
# Configure sudo path to include /usr/local/bin
sudo tee /etc/sudoers.d/preserve_path << EOF
Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin
EOF
# Load NBD module with increased device count
sudo modprobe nbd nbds_max=4096
# Clean up temporary files
rm -rf /tmp/drafter-install
# Verify installations
for CMD in drafter-nat drafter-forwarder drafter-snapshotter drafter-packager drafter-runner drafter-registry drafter-mounter drafter-peer drafter-terminator firecracker jailer; do
echo "Checking $CMD version..."
$CMD --version || echo "$CMD version check not supported"
done
Step 3: Preparing the Redis VM
On the source instance:
# Install redis cli
sudo dnf install -y redis
### Preparing the Environment
# Create working directories
mkdir -p out/blueprint out/package out/instance-0/{overlay,state}
# Download DrafterOS with PVM support
curl -Lo out/drafteros-oci.tar.zst "https://github.com/loopholelabs/drafter/releases/latest/download/drafteros-oci-$(uname -m)_pvm.tar.zst"
curl -Lo out/oci-valkey.tar.zst "https://github.com/loopholelabs/drafter/releases/latest/download/oci-valkey-$(uname -m).tar.zst"
# Extract DrafterOS blueprint
drafter-packager --package-path out/drafteros-oci.tar.zst --extract --devices '[
{
"name": "kernel",
"path": "out/blueprint/vmlinux"
},
{
"name": "disk",
"path": "out/blueprint/rootfs.ext4"
}
]'
sudo drafter-packager --package-path out/oci-valkey.tar.zst --extract --devices '[
{
"name": "oci",
"path": "out/blueprint/oci.ext4"
}
]'
Step 4: The Migration Process
The actual migration is remarkably simple:
On source instance:
Network Setup and Snapshot Creation
# Open new terminal. This will start NAT for network connectivity
sudo drafter-nat --host-interface eth0 # Replace eth0 with the network interface you want to route outgoing traffic from the VMs to
# Open new terminal and run it. This will create initial snapshot
sudo drafter-snapshotter --netns ark0 --cpu-template T2 --devices '[
{
"name": "state",
"output": "out/package/state.bin"
},
{
"name": "memory",
"output": "out/package/memory.bin"
},
{
"name": "kernel",
"input": "out/blueprint/vmlinux",
"output": "out/package/vmlinux"
},
{
"name": "disk",
"input": "out/blueprint/rootfs.ext4",
"output": "out/package/rootfs.ext4"
},
{
"name": "config",
"output": "out/package/config.json"
},
{
"name": "oci",
"input": "out/blueprint/oci.ext4",
"output": "out/package/oci.ext4"
}
]'
Starting Redis with Migration Support
# Open new terminal and run it. This will start Redis with migration enabled
sudo drafter-peer --netns ark0 --raddr '' --laddr ':1337' --devices '[
{
"name": "state",
"base": "out/package/state.bin",
"overlay": "out/instance-0/overlay/state.bin",
"state": "out/instance-0/state/state.bin",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "memory",
"base": "out/package/memory.bin",
"overlay": "out/instance-0/overlay/memory.bin",
"state": "out/instance-0/state/memory.bin",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "kernel",
"base": "out/package/vmlinux",
"overlay": "out/instance-0/overlay/vmlinux",
"state": "out/instance-0/state/vmlinux",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "disk",
"base": "out/package/rootfs.ext4",
"overlay": "out/instance-0/overlay/rootfs.ext4",
"state": "out/instance-0/state/rootfs.ext4",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "config",
"base": "out/package/config.json",
"overlay": "out/instance-0/overlay/config.json",
"state": "out/instance-0/state/config.json",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "oci",
"base": "out/package/oci.ext4",
"overlay": "out/instance-0/overlay/oci.ext4",
"state": "out/instance-0/state/oci.ext4",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
}
]'
Start port forwarding on source vm for redis
# Open new terminal and run it. This will set up port forwarding for Redis
sudo drafter-forwarder --port-forwards '[
{
"netns": "ark0",
"internalPort": "6379",
"protocol": "tcp",
"externalAddr": "127.0.0.1:3333"
}
]'
Connect to redis and add dummy data
redis-cli -p 3333 ping
redis-cli -p 3333
# Once connected, you can run these commands:
SET message "Hello from Rocky Linux!"
SET user "rockyuser"
SET counter 42
# Check the data
KEYS *
GET message
GET user
GET counter
Destination VM
# Open new terminal. This will start NAT for network connectivity
sudo drafter-nat --host-interface eth0 # Replace eth0 with the network interface you want to route outgoing traffic from the VMs to
Migration Process
# Open new terminal and run it. This will Start migration receiver (replace FIRST_INSTANCE_IP)
sudo drafter-peer --netns ark1 --raddr 'SOURCE_VM_IP:1337' --laddr '' --devices '[
{
"name": "state",
"base": "out/package/state.bin",
"overlay": "out/instance-1/overlay/state.bin",
"state": "out/instance-1/state/state.bin",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "memory",
"base": "out/package/memory.bin",
"overlay": "out/instance-1/overlay/memory.bin",
"state": "out/instance-1/state/memory.bin",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "kernel",
"base": "out/package/vmlinux",
"overlay": "out/instance-1/overlay/vmlinux",
"state": "out/instance-1/state/vmlinux",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "disk",
"base": "out/package/rootfs.ext4",
"overlay": "out/instance-1/overlay/rootfs.ext4",
"state": "out/instance-1/state/rootfs.ext4",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "config",
"base": "out/package/config.json",
"overlay": "out/instance-1/overlay/config.json",
"state": "out/instance-1/state/config.json",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
},
{
"name": "oci",
"base": "out/package/oci.ext4",
"overlay": "out/instance-1/overlay/oci.ext4",
"state": "out/instance-1/state/oci.ext4",
"blockSize": 65536,
"expiry": 1000000000,
"maxDirtyBlocks": 200,
"minCycles": 5,
"maxCycles": 20,
"cycleThrottle": 500000000,
"makeMigratable": true,
"shared": false
}
]'
After migration completes
# Open new terminal and run it. This will Set up port forwarding on second instance
sudo drafter-forwarder --port-forwards '[
{
"netns": "ark1",
"internalPort": "6379",
"protocol": "tcp",
"externalAddr": "127.0.0.1:3333"
}
]'
By doing this you will see in source VM logs that migration is started.
Verify Data on destination VM
### Open new terminal and connect to redis and check instance 1 data is there or not
```bash
redis-cli -p 3333 ping
redis-cli -p 3333
# Check the data
KEYS *
GET message
GET user
GET counter
```
During migration, Drafter:
Creates a snapshot of the running VM
Transfers the VM state between instances
Resumes the VM on the destination
All while keeping Redis running and accepting connections
The Technical Magic Behind It
Drafter achieves this feat through several innovative approaches:
Custom Firecracker Fork: Uses an optimized version of AWS's Firecracker VMM
PVM Integration: Enables migration between different CPU models
Hybrid Migration Strategy: Uses both pre-copy and post-copy strategies for optimal performance
Block-Level Synchronization: Efficiently transfers only changed memory blocks
Real-World Applications
This technology opens up several interesting possibilities:
Spot Instance Optimization: Run databases on spot instances, migrating before termination
Cross-Cloud Migration: Move workloads between cloud providers without downtime
Maintenance Windows: Eliminate downtime during hardware maintenance
Geographic Optimization: Move workloads closer to users during peak times
Benchmarks and Performance
In our Redis migration example:
Local data center migration: ~100ms
Cross-regional migration: ~500ms
Zero data loss during migration
Continuous availability of Redis service
Security Considerations
When implementing Drafter in production:
Network Security:
Use TLS for migration traffic
Implement proper firewall rules
Consider VPC peering for cross-region migrations
Access Control:
Use proper file permissions for VM images
Implement authentication for migration endpoints
Follow principle of least privilege
Future Possibilities
The implications of this technology are significant:
Multi-Cloud Orchestration: Seamless workload movement between clouds
Cost Optimization: Dynamic workload placement based on spot pricing
Geographic Redundancy: Easy movement of stateful services between regions
Development Workflows: Instant environment replication with state
Conclusion
Drafter represents a significant advancement in managing stateful workloads in cloud environments. By enabling true zero-downtime migration of entire VMs, it solves one of the most challenging aspects of cloud computing: managing state across heterogeneous infrastructure.
Whether you're looking to optimize costs with spot instances, implement cross-cloud strategies, or simply need a better way to manage stateful workloads, Drafter provides a powerful new tool in your cloud architecture toolkit.
Resources
My Github. Feel free to comment or connect with me for any issue.
Remember: While the setup might seem complex, the benefits of zero-downtime migration and the ability to run stateful workloads on spot instances can lead to significant cost savings and operational improvements in your cloud infrastructure.
Subscribe to my newsletter
Read articles from Dhairya Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Dhairya Patel
Dhairya Patel
A DevOps Engineer with 2 year of experience as infrastructure support (AWS, Linux, Azure), DevOps (Build, CICD & Release Management)