Building a 3-Node Proxmox HA Cluster with IPMI Fencing

Table of contents
- Infrastructure Overview
- Prerequisites and Hardware Requirements
- Step 1: Initial Node Installation and Configuration
- Step 2: Configure IPMI Access
- Step 3: Configure Shared Storage
- Step 4: Create Proxmox Cluster
- Step 5: Configure Shared Storage in Proxmox
- Step 6: Install and Configure Fence Agents
- Step 7: Configure High Availability
- Step 8: Monitoring and Alerting Configuration
- Step 9: Testing and Validation
- Step 10: Backup and Documentation
- Maintenance and Monitoring
- Troubleshooting Common Issues
- Conclusion
This guide provides step-by-step instructions for creating a production-ready 3-node Proxmox HA cluster with IPMI-based fencing. Follow these procedures to build a resilient virtualization infrastructure that can withstand hardware failures.
Infrastructure Overview
Node Specifications
pve-node01: Primary cluster node (192.168.1.101)
pve-node02: Secondary cluster node (192.168.1.102)
pve-node03: Tertiary cluster node (192.168.1.103)
Network Configuration
Management Network: 192.168.1.0/24 (VM traffic, web UI access)
Cluster Network: 192.168.10.0/24 (Corosync communication)
Storage Network: 192.168.20.0/24 (Shared storage access)
IPMI Network: 192.168.100.0/24 (Out-of-band management)
IP Address Assignments
Node | Management IP | Cluster IP | Storage IP | IPMI IP |
pve-node01 | 192.168.1.101 | 192.168.10.101 | 192.168.20.101 | 192.168.100.101 |
pve-node02 | 192.168.1.102 | 192.168.10.102 | 192.168.20.102 | 192.168.100.102 |
pve-node03 | 192.168.1.103 | 192.168.10.103 | 192.168.20.103 | 192.168.100.103 |
Prerequisites and Hardware Requirements
Hardware Specifications (Per Node)
CPU: Minimum 4 cores, VT-x/AMD-V support
RAM: Minimum 16GB, recommend 32GB+ for production
Storage:
120GB SSD for Proxmox OS
Additional storage for VMs (local or shared)
Network: 4x Gigabit Ethernet ports minimum
IPMI: Dedicated IPMI/BMC interface
Network Infrastructure
Managed switches with VLAN support
Redundant network paths for each network segment
NTP server for time synchronization
DNS server for name resolution
Step 1: Initial Node Installation and Configuration
1.1 Install Proxmox VE on Each Node
Download the latest Proxmox VE ISO and perform a standard installation on each node. During installation:
Disk Configuration: Use the entire SSD for the Proxmox installation
Network Configuration: Configure the primary management interface
Root Password: Set a strong root password (document securely)
1.2 Post-Installation Network Configuration
Configure network interfaces on each node. Edit /etc/network/interfaces
:
For pve-node01:
# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
address 192.168.1.101/24
gateway 192.168.1.1
bridge-ports ens18
bridge-stp off
bridge-fd 0
# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
address 192.168.10.101/24
bridge-ports ens19
bridge-stp off
bridge-fd 0
# Storage network interface
auto vmbr2
iface vmbr2 inet static
address 192.168.20.101/24
bridge-ports ens20
bridge-stp off
bridge-fd 0
For pve-node02:
# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
address 192.168.1.102/24
gateway 192.168.1.1
bridge-ports ens18
bridge-stp off
bridge-fd 0
# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
address 192.168.10.102/24
bridge-ports ens19
bridge-stp off
bridge-fd 0
# Storage network interface
auto vmbr2
iface vmbr2 inet static
address 192.168.20.102/24
bridge-ports ens20
bridge-stp off
bridge-fd 0
For pve-node03:
# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
address 192.168.1.103/24
gateway 192.168.1.1
bridge-ports ens18
bridge-stp off
bridge-fd 0
# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
address 192.168.10.103/24
bridge-ports ens19
bridge-stp off
bridge-fd 0
# Storage network interface
auto vmbr2
iface vmbr2 inet static
address 192.168.20.103/24
bridge-ports ens20
bridge-stp off
bridge-fd 0
1.3 Apply Network Configuration
On each node, restart networking:
systemctl restart networking
1.4 Configure NTP Synchronization
Edit /etc/systemd/timesyncd.conf
on each node:
[Time]
NTP=pool.ntp.org
FallbackNTP=time.cloudflare.com
Restart time synchronization:
systemctl restart systemd-timesyncd
systemctl enable systemd-timesyncd
1.5 Update System Packages
On each node:
apt update && apt upgrade -y
reboot
Step 2: Configure IPMI Access
2.1 Configure IPMI on Each Node
Access the IPMI interface through the server's BIOS/UEFI or dedicated management port:
For pve-node01:
IPMI IP: 192.168.100.101
Subnet Mask: 255.255.255.0
Gateway: 192.168.100.1
Username: admin
Password: ProxmoxIPMI01!
For pve-node02:
IPMI IP: 192.168.100.102
Subnet Mask: 255.255.255.0
Gateway: 192.168.100.1
Username: admin
Password: ProxmoxIPMI02!
For pve-node03:
IPMI IP: 192.168.100.103
Subnet Mask: 255.255.255.0
Gateway: 192.168.100.1
Username: admin
Password: ProxmoxIPMI03!
2.2 Test IPMI Connectivity
From each node, test IPMI connectivity to other nodes:
From pve-node01:
ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status
ipmitool -H 192.168.100.103 -U admin -P ProxmoxIPMI03! power status
From pve-node02:
ipmitool -H 192.168.100.101 -U admin -P ProxmoxIPMI01! power status
ipmitool -H 192.168.100.103 -U admin -P ProxmoxIPMI03! power status
From pve-node03:
ipmitool -H 192.168.100.101 -U admin -P ProxmoxIPMI01! power status
ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status
Step 3: Configure Shared Storage
3.1 NFS Storage Configuration
Here in this example I am using NFS server with the following exports in /etc/exports
:
/srv/proxmox/templates 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)
/srv/proxmox/backups 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)
/srv/proxmox/vms 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)
3.2 Mount NFS Storage on Each Node
On each Proxmox node, create mount points and test NFS connectivity:
# Create mount points
mkdir -p /mnt/nfs-templates
mkdir -p /mnt/nfs-backups
mkdir -p /mnt/nfs-vms
# Test NFS mounts
mount -t nfs 192.168.20.10:/srv/proxmox/templates /mnt/nfs-templates
mount -t nfs 192.168.20.10:/srv/proxmox/backups /mnt/nfs-backups
mount -t nfs 192.168.20.10:/srv/proxmox/vms /mnt/nfs-vms
# Verify mounts
df -h | grep nfs
# Unmount for now (will be configured through Proxmox GUI)
umount /mnt/nfs-*
Step 4: Create Proxmox Cluster
4.1 Initialize Cluster on Primary Node
On pve-node01, create the cluster:
pvecm create proxmox-cluster --bindnet0_addr 192.168.10.101 --ring0_addr 192.168.10.101
4.2 Verify Cluster Status
Check cluster status on pve-node01:
pvecm status
pvecm nodes
4.3 Generate Join Information
On pve-node01, get the join information:
pvecm add pve-node02 --ring0_addr 192.168.10.102
This will display a join command. Copy this command.
4.4 Join Additional Nodes
On pve-node02, join the cluster:
pvecm add pve-node01 --ring0_addr 192.168.10.102
On pve-node03, join the cluster:
pvecm add pve-node01 --ring0_addr 192.168.10.103
4.5 Verify Cluster Formation
On any node, verify all nodes are part of the cluster:
pvecm status
pvecm nodes
Expected output should show all three nodes as online.
Step 5: Configure Shared Storage in Proxmox
5.1 Add NFS Storage via GUI
Access Proxmox web interface:
https://192.168.1.101:8006
Navigate to Datacenter → Storage → Add → NFS
Templates Storage:
ID: nfs-templates
Server: 192.168.20.10
Export: /srv/proxmox/templates
Content: ISO image, Container template
Backups Storage:
ID: nfs-backups
Server: 192.168.20.10
Export: /srv/proxmox/backups
Content: Backup file
VMs Storage:
ID: nfs-vms
Server: 192.168.20.10
Export: /srv/proxmox/vms
Content: Disk image, Container
5.2 Verify Storage Access
On each node, verify storage is accessible:
pvesm status
pvesm list nfs-vms
Step 6: Install and Configure Fence Agents
6.1 Install Fence Agents on All Nodes
On each node, install the fence agents package:
apt update
apt install fence-agents -y
6.2 Configure Fencing Devices
Method 1: Using Proxmox Web Interface
Navigate to Datacenter → HA → Fencing
Click Add
Configure each fence device:
For pve-node01:
ID: fence-node01
Type: fence_ipmilan
IP: 192.168.100.101
Username: admin
Password: ProxmoxIPMI01!
Options: lanplus=1,power_wait=10
For pve-node02:
ID: fence-node02
Type: fence_ipmilan
IP: 192.168.100.102
Username: admin
Password: ProxmoxIPMI02!
Options: lanplus=1,power_wait=10
For pve-node03:
ID: fence-node03
Type: fence_ipmilan
IP: 192.168.100.103
Username: admin
Password: ProxmoxIPMI03!
Options: lanplus=1,power_wait=10
Method 2: Using Command Line
On any cluster node, add fence devices:
# Add fence device for pve-node01
pvesh create /nodes/pve-node01/fence \
--device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.101,login=admin,passwd=ProxmoxIPMI01!,power_wait=10
# Add fence device for pve-node02
pvesh create /nodes/pve-node02/fence \
--device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.102,login=admin,passwd=ProxmoxIPMI02!,power_wait=10
# Add fence device for pve-node03
pvesh create /nodes/pve-node03/fence \
--device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.103,login=admin,passwd=ProxmoxIPMI03!,power_wait=10
6.3 Configure Fencing Mode
Edit /etc/pve/datacenter.cfg
to enable hardware fencing:
fencing: hardware
Or use the GUI: Datacenter → HA → Options → Fencing → hardware
6.4 Test Fencing Operations
Test fence device configuration:
# Test fencing pve-node02 from pve-node01
fence_ipmilan -a 192.168.100.102 -l admin -p ProxmoxIPMI02! -o status
# Test fencing pve-node03 from pve-node01
fence_ipmilan -a 192.168.100.103 -l admin -p ProxmoxIPMI03! -o status
Step 7: Configure High Availability
7.1 Create HA Groups
Navigate to Datacenter → HA → Groups → Create
Critical Services Group:
Group ID: critical-services
Nodes: pve-node01:100,pve-node02:90,pve-node03:80
Restrict to Group: Yes
Standard Services Group:
Group ID: standard-services
Nodes: pve-node01:80,pve-node02:90,pve-node03:100
Restrict to Group: No
7.2 Configure HA Resources
For each VM that needs HA protection:
Navigate to Datacenter → HA → Resources → Add
Configure the resource:
Resource ID: vm:100 (example VM ID)
Group: critical-services
Max Restart: 3
Max Relocate: 1
State: started
7.3 Test HA Functionality
Create a test VM and enable HA:
# Create a test VM
qm create 999 --name test-ha-vm --memory 512 --net0 virtio,bridge=vmbr0 --bootdisk scsi0 --scsi0 nfs-vms:2,format=qcow2
# Add VM to HA
ha-manager add vm:999 --state started --group critical-services --max_restart 3 --max_relocate 1
Step 8: Monitoring and Alerting Configuration
8.1 Configure Email Notifications
Edit /etc/pve/user.cfg
to add email notifications:
user:root@pam:1:0:::admin@example.com::
8.2 Configure Notification Settings
Navigate to Datacenter → Options → Notification and configure:
8.3 Set Up Log Monitoring
Create log monitoring script /usr/local/bin/
ha-monitor.sh
:
#!/bin/bash
# Monitor HA events and send alerts
LOGFILE="/var/log/pve-ha-manager.log"
ALERT_EMAIL="admin@example.com"
tail -f $LOGFILE | while read line; do
if echo "$line" | grep -q "ERROR\|FAILED\|fence"; then
echo "$line" | mail -s "Proxmox HA Alert" $ALERT_EMAIL
fi
done
Make it executable and create a systemd service:
chmod +x /usr/local/bin/ha-monitor.sh
Step 9: Testing and Validation
9.1 Test Cluster Connectivity
Verify corosync communication:
corosync-cfgtool -s
corosync-cmapctl | grep ring
9.2 Test Fence Operations
Controlled fence test:
# From pve-node01, fence pve-node02 (WARNING: This will power off the node)
pvesh create /nodes/pve-node02/fence
Monitor the operation:
tail -f /var/log/pve-ha-manager.log
9.3 Test VM Failover
Create a test VM on pve-node01
Enable HA for the VM
Simulate node failure by shutting down pve-node01
Verify VM starts on another node
Check HA logs for proper operation
9.4 Validate Storage Access
Ensure all nodes can access shared storage:
# On each node
pvesm status
touch /mnt/pve/nfs-vms/test-file-$(hostname)
ls -la /mnt/pve/nfs-vms/test-file-*
Step 10: Backup and Documentation
10.1 Backup Cluster Configuration
Create configuration backup:
# Backup cluster configuration
tar -czf /root/cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve/
# Backup to external location
scp /root/cluster-backup-$(date +%Y%m%d).tar.gz user@backupserver:/backups/proxmox/
Maintenance and Monitoring
Daily Tasks
Check cluster status:
pvecm status
Monitor HA resources:
ha-manager status
Review system logs:
journalctl -u pve-ha-lrm
Weekly Tasks
Test backup procedures
Review performance metrics
Update documentation
Monthly Tasks
Test fencing operations
Perform planned failover tests
Review and update monitoring thresholds
Troubleshooting Common Issues
Cluster Communication Problems
# Check corosync status
systemctl status corosync
# View corosync logs
journalctl -u corosync
# Check network connectivity
ping 192.168.10.102 # From pve-node01
Storage Access Issues
# Check NFS mounts
df -h | grep nfs
mount | grep nfs
# Test storage connectivity
pvesm status
Fencing Problems
# Test IPMI connectivity
ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status
# Check fence agent logs
journalctl -u pve-ha-lrm | grep fence
Conclusion
This comprehensive guide provides a complete implementation of a 3-node Proxmox HA cluster with IPMI fencing. The configuration includes redundant networking, shared storage, and proper fencing mechanisms to ensure high availability and data protection.
Regular testing and maintenance of the cluster components are essential for maintaining reliability. Always document changes and maintain current configuration backups to ensure rapid recovery in case of issues.
Subscribe to my newsletter
Read articles from Vishvendra Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by