Building a 3-Node Proxmox HA Cluster with IPMI Fencing

Table of contents

This guide provides step-by-step instructions for creating a production-ready 3-node Proxmox HA cluster with IPMI-based fencing. Follow these procedures to build a resilient virtualization infrastructure that can withstand hardware failures.

Infrastructure Overview

Node Specifications

  • pve-node01: Primary cluster node (192.168.1.101)

  • pve-node02: Secondary cluster node (192.168.1.102)

  • pve-node03: Tertiary cluster node (192.168.1.103)

Network Configuration

  • Management Network: 192.168.1.0/24 (VM traffic, web UI access)

  • Cluster Network: 192.168.10.0/24 (Corosync communication)

  • Storage Network: 192.168.20.0/24 (Shared storage access)

  • IPMI Network: 192.168.100.0/24 (Out-of-band management)

IP Address Assignments

NodeManagement IPCluster IPStorage IPIPMI IP
pve-node01192.168.1.101192.168.10.101192.168.20.101192.168.100.101
pve-node02192.168.1.102192.168.10.102192.168.20.102192.168.100.102
pve-node03192.168.1.103192.168.10.103192.168.20.103192.168.100.103

Prerequisites and Hardware Requirements

Hardware Specifications (Per Node)

  • CPU: Minimum 4 cores, VT-x/AMD-V support

  • RAM: Minimum 16GB, recommend 32GB+ for production

  • Storage:

    • 120GB SSD for Proxmox OS

    • Additional storage for VMs (local or shared)

  • Network: 4x Gigabit Ethernet ports minimum

  • IPMI: Dedicated IPMI/BMC interface

Network Infrastructure

  • Managed switches with VLAN support

  • Redundant network paths for each network segment

  • NTP server for time synchronization

  • DNS server for name resolution

Step 1: Initial Node Installation and Configuration

1.1 Install Proxmox VE on Each Node

Download the latest Proxmox VE ISO and perform a standard installation on each node. During installation:

  1. Disk Configuration: Use the entire SSD for the Proxmox installation

  2. Network Configuration: Configure the primary management interface

  3. Root Password: Set a strong root password (document securely)

1.2 Post-Installation Network Configuration

Configure network interfaces on each node. Edit /etc/network/interfaces:

For pve-node01:

# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.101/24
    gateway 192.168.1.1
    bridge-ports ens18
    bridge-stp off
    bridge-fd 0

# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
    address 192.168.10.101/24
    bridge-ports ens19
    bridge-stp off
    bridge-fd 0

# Storage network interface
auto vmbr2
iface vmbr2 inet static
    address 192.168.20.101/24
    bridge-ports ens20
    bridge-stp off
    bridge-fd 0

For pve-node02:

# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.102/24
    gateway 192.168.1.1
    bridge-ports ens18
    bridge-stp off
    bridge-fd 0

# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
    address 192.168.10.102/24
    bridge-ports ens19
    bridge-stp off
    bridge-fd 0

# Storage network interface
auto vmbr2
iface vmbr2 inet static
    address 192.168.20.102/24
    bridge-ports ens20
    bridge-stp off
    bridge-fd 0

For pve-node03:

# Management interface (existing)
auto vmbr0
iface vmbr0 inet static
    address 192.168.1.103/24
    gateway 192.168.1.1
    bridge-ports ens18
    bridge-stp off
    bridge-fd 0

# Cluster communication interface
auto vmbr1
iface vmbr1 inet static
    address 192.168.10.103/24
    bridge-ports ens19
    bridge-stp off
    bridge-fd 0

# Storage network interface
auto vmbr2
iface vmbr2 inet static
    address 192.168.20.103/24
    bridge-ports ens20
    bridge-stp off
    bridge-fd 0

1.3 Apply Network Configuration

On each node, restart networking:

systemctl restart networking

1.4 Configure NTP Synchronization

Edit /etc/systemd/timesyncd.conf on each node:

[Time]
NTP=pool.ntp.org
FallbackNTP=time.cloudflare.com

Restart time synchronization:

systemctl restart systemd-timesyncd
systemctl enable systemd-timesyncd

1.5 Update System Packages

On each node:

apt update && apt upgrade -y
reboot

Step 2: Configure IPMI Access

2.1 Configure IPMI on Each Node

Access the IPMI interface through the server's BIOS/UEFI or dedicated management port:

For pve-node01:

  • IPMI IP: 192.168.100.101

  • Subnet Mask: 255.255.255.0

  • Gateway: 192.168.100.1

  • Username: admin

  • Password: ProxmoxIPMI01!

For pve-node02:

  • IPMI IP: 192.168.100.102

  • Subnet Mask: 255.255.255.0

  • Gateway: 192.168.100.1

  • Username: admin

  • Password: ProxmoxIPMI02!

For pve-node03:

  • IPMI IP: 192.168.100.103

  • Subnet Mask: 255.255.255.0

  • Gateway: 192.168.100.1

  • Username: admin

  • Password: ProxmoxIPMI03!

2.2 Test IPMI Connectivity

From each node, test IPMI connectivity to other nodes:

From pve-node01:

ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status
ipmitool -H 192.168.100.103 -U admin -P ProxmoxIPMI03! power status

From pve-node02:

ipmitool -H 192.168.100.101 -U admin -P ProxmoxIPMI01! power status
ipmitool -H 192.168.100.103 -U admin -P ProxmoxIPMI03! power status

From pve-node03:

ipmitool -H 192.168.100.101 -U admin -P ProxmoxIPMI01! power status
ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status

Step 3: Configure Shared Storage

3.1 NFS Storage Configuration

Here in this example I am using NFS server with the following exports in /etc/exports:

/srv/proxmox/templates 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)
/srv/proxmox/backups 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)
/srv/proxmox/vms 192.168.20.0/24(rw,sync,no_root_squash,no_subtree_check)

3.2 Mount NFS Storage on Each Node

On each Proxmox node, create mount points and test NFS connectivity:

# Create mount points
mkdir -p /mnt/nfs-templates
mkdir -p /mnt/nfs-backups
mkdir -p /mnt/nfs-vms

# Test NFS mounts
mount -t nfs 192.168.20.10:/srv/proxmox/templates /mnt/nfs-templates
mount -t nfs 192.168.20.10:/srv/proxmox/backups /mnt/nfs-backups
mount -t nfs 192.168.20.10:/srv/proxmox/vms /mnt/nfs-vms

# Verify mounts
df -h | grep nfs

# Unmount for now (will be configured through Proxmox GUI)
umount /mnt/nfs-*

Step 4: Create Proxmox Cluster

4.1 Initialize Cluster on Primary Node

On pve-node01, create the cluster:

pvecm create proxmox-cluster --bindnet0_addr 192.168.10.101 --ring0_addr 192.168.10.101

4.2 Verify Cluster Status

Check cluster status on pve-node01:

pvecm status
pvecm nodes

4.3 Generate Join Information

On pve-node01, get the join information:

pvecm add pve-node02 --ring0_addr 192.168.10.102

This will display a join command. Copy this command.

4.4 Join Additional Nodes

On pve-node02, join the cluster:

pvecm add pve-node01 --ring0_addr 192.168.10.102

On pve-node03, join the cluster:

pvecm add pve-node01 --ring0_addr 192.168.10.103

4.5 Verify Cluster Formation

On any node, verify all nodes are part of the cluster:

pvecm status
pvecm nodes

Expected output should show all three nodes as online.

Step 5: Configure Shared Storage in Proxmox

5.1 Add NFS Storage via GUI

  1. Access Proxmox web interface: https://192.168.1.101:8006

  2. Navigate to DatacenterStorageAddNFS

Templates Storage:

  • ID: nfs-templates

  • Server: 192.168.20.10

  • Export: /srv/proxmox/templates

  • Content: ISO image, Container template

Backups Storage:

  • ID: nfs-backups

  • Server: 192.168.20.10

  • Export: /srv/proxmox/backups

  • Content: Backup file

VMs Storage:

  • ID: nfs-vms

  • Server: 192.168.20.10

  • Export: /srv/proxmox/vms

  • Content: Disk image, Container

5.2 Verify Storage Access

On each node, verify storage is accessible:

pvesm status
pvesm list nfs-vms

Step 6: Install and Configure Fence Agents

6.1 Install Fence Agents on All Nodes

On each node, install the fence agents package:

apt update
apt install fence-agents -y

6.2 Configure Fencing Devices

Method 1: Using Proxmox Web Interface

  1. Navigate to DatacenterHAFencing

  2. Click Add

  3. Configure each fence device:

For pve-node01:

  • ID: fence-node01

  • Type: fence_ipmilan

  • IP: 192.168.100.101

  • Username: admin

  • Password: ProxmoxIPMI01!

  • Options: lanplus=1,power_wait=10

For pve-node02:

  • ID: fence-node02

  • Type: fence_ipmilan

  • IP: 192.168.100.102

  • Username: admin

  • Password: ProxmoxIPMI02!

  • Options: lanplus=1,power_wait=10

For pve-node03:

  • ID: fence-node03

  • Type: fence_ipmilan

  • IP: 192.168.100.103

  • Username: admin

  • Password: ProxmoxIPMI03!

  • Options: lanplus=1,power_wait=10

Method 2: Using Command Line

On any cluster node, add fence devices:

# Add fence device for pve-node01
pvesh create /nodes/pve-node01/fence \
  --device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.101,login=admin,passwd=ProxmoxIPMI01!,power_wait=10

# Add fence device for pve-node02
pvesh create /nodes/pve-node02/fence \
  --device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.102,login=admin,passwd=ProxmoxIPMI02!,power_wait=10

# Add fence device for pve-node03
pvesh create /nodes/pve-node03/fence \
  --device agent=fence_ipmilan,lanplus=1,ipaddr=192.168.100.103,login=admin,passwd=ProxmoxIPMI03!,power_wait=10

6.3 Configure Fencing Mode

Edit /etc/pve/datacenter.cfg to enable hardware fencing:

fencing: hardware

Or use the GUI: DatacenterHAOptionsFencinghardware

6.4 Test Fencing Operations

Test fence device configuration:

# Test fencing pve-node02 from pve-node01
fence_ipmilan -a 192.168.100.102 -l admin -p ProxmoxIPMI02! -o status

# Test fencing pve-node03 from pve-node01
fence_ipmilan -a 192.168.100.103 -l admin -p ProxmoxIPMI03! -o status

Step 7: Configure High Availability

7.1 Create HA Groups

Navigate to DatacenterHAGroupsCreate

Critical Services Group:

  • Group ID: critical-services

  • Nodes: pve-node01:100,pve-node02:90,pve-node03:80

  • Restrict to Group: Yes

Standard Services Group:

  • Group ID: standard-services

  • Nodes: pve-node01:80,pve-node02:90,pve-node03:100

  • Restrict to Group: No

7.2 Configure HA Resources

For each VM that needs HA protection:

  1. Navigate to DatacenterHAResourcesAdd

  2. Configure the resource:

    • Resource ID: vm:100 (example VM ID)

    • Group: critical-services

    • Max Restart: 3

    • Max Relocate: 1

    • State: started

7.3 Test HA Functionality

Create a test VM and enable HA:

# Create a test VM
qm create 999 --name test-ha-vm --memory 512 --net0 virtio,bridge=vmbr0 --bootdisk scsi0 --scsi0 nfs-vms:2,format=qcow2

# Add VM to HA
ha-manager add vm:999 --state started --group critical-services --max_restart 3 --max_relocate 1

Step 8: Monitoring and Alerting Configuration

8.1 Configure Email Notifications

Edit /etc/pve/user.cfg to add email notifications:

user:root@pam:1:0:::admin@example.com::

8.2 Configure Notification Settings

Navigate to DatacenterOptionsNotification and configure:

8.3 Set Up Log Monitoring

Create log monitoring script /usr/local/bin/ha-monitor.sh:

#!/bin/bash
# Monitor HA events and send alerts

LOGFILE="/var/log/pve-ha-manager.log"
ALERT_EMAIL="admin@example.com"

tail -f $LOGFILE | while read line; do
    if echo "$line" | grep -q "ERROR\|FAILED\|fence"; then
        echo "$line" | mail -s "Proxmox HA Alert" $ALERT_EMAIL
    fi
done

Make it executable and create a systemd service:

chmod +x /usr/local/bin/ha-monitor.sh

Step 9: Testing and Validation

9.1 Test Cluster Connectivity

Verify corosync communication:

corosync-cfgtool -s
corosync-cmapctl | grep ring

9.2 Test Fence Operations

Controlled fence test:

# From pve-node01, fence pve-node02 (WARNING: This will power off the node)
pvesh create /nodes/pve-node02/fence

Monitor the operation:

tail -f /var/log/pve-ha-manager.log

9.3 Test VM Failover

  1. Create a test VM on pve-node01

  2. Enable HA for the VM

  3. Simulate node failure by shutting down pve-node01

  4. Verify VM starts on another node

  5. Check HA logs for proper operation

9.4 Validate Storage Access

Ensure all nodes can access shared storage:

# On each node
pvesm status
touch /mnt/pve/nfs-vms/test-file-$(hostname)
ls -la /mnt/pve/nfs-vms/test-file-*

Step 10: Backup and Documentation

10.1 Backup Cluster Configuration

Create configuration backup:

# Backup cluster configuration
tar -czf /root/cluster-backup-$(date +%Y%m%d).tar.gz /etc/pve/

# Backup to external location
scp /root/cluster-backup-$(date +%Y%m%d).tar.gz user@backupserver:/backups/proxmox/

Maintenance and Monitoring

Daily Tasks

  • Check cluster status: pvecm status

  • Monitor HA resources: ha-manager status

  • Review system logs: journalctl -u pve-ha-lrm

Weekly Tasks

  • Test backup procedures

  • Review performance metrics

  • Update documentation

Monthly Tasks

  • Test fencing operations

  • Perform planned failover tests

  • Review and update monitoring thresholds

Troubleshooting Common Issues

Cluster Communication Problems

# Check corosync status
systemctl status corosync

# View corosync logs
journalctl -u corosync

# Check network connectivity
ping 192.168.10.102  # From pve-node01

Storage Access Issues

# Check NFS mounts
df -h | grep nfs
mount | grep nfs

# Test storage connectivity
pvesm status

Fencing Problems

# Test IPMI connectivity
ipmitool -H 192.168.100.102 -U admin -P ProxmoxIPMI02! power status

# Check fence agent logs
journalctl -u pve-ha-lrm | grep fence

Conclusion

This comprehensive guide provides a complete implementation of a 3-node Proxmox HA cluster with IPMI fencing. The configuration includes redundant networking, shared storage, and proper fencing mechanisms to ensure high availability and data protection.

Regular testing and maintenance of the cluster components are essential for maintaining reliability. Always document changes and maintain current configuration backups to ensure rapid recovery in case of issues.

1
Subscribe to my newsletter

Read articles from Vishvendra Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vishvendra Singh
Vishvendra Singh