⚖️ Why High Availability Needs a Third Arbitrator — Especially in SD-WAN Deployments

Ronald BartelsRonald Bartels
4 min read

High Availability (HA) in networking is often built on the assumption that two devices—be it routers, firewalls, or SD-WAN edges—are enough to ensure continuity. These devices operate in an active/standby arrangement, watching each other through heartbeat links and switching roles when one fails.

But this two-device model hides a dangerous flaw: what happens when both devices are alive, but the link between them is down? Who decides who should remain active? Without a clear answer, you risk one of the worst outcomes in HA systems: split-brain.


🧠 Split-Brain | The Invisible Threat to Network Stability

Split-brain occurs when two HA nodes lose communication with each other, but both remain independently operational. Each node assumes the other is dead and transitions to active state. This leads to:

  • 🔁 Conflicting routing decisions

  • 🔌 Duplicate tunnels or firewall states

  • ❌ Network loops or IP conflicts

  • 💥 Application failures and dropped sessions

In legacy systems, detecting and recovering from split-brain often requires manual intervention or—worse—a user complaint. But in modern SD-WAN environments, this is unacceptable.


🧭 Enter the Third Arbitrator (a.k.a. the Tie-Breaker or Witness Node)

To prevent split-brain, you need a third party—an arbitrator—who can observe both devices and decide who should be active. This ensures:

  • A single source of truth

  • Consistent HA decision-making

  • Deterministic failover and failback

In SD-WAN, this arbitrator is typically the cloud orchestrator or controller node. Fusion’s SD-WAN architecture makes use of this model, where the orchestrator acts as a neutral third party that maintains heartbeat monitoring and link state awareness across all nodes.

If two edge devices lose sight of each other, the orchestrator can still talk to both and enforce correct active/standby behaviour.


🌐 Why This Matters More in SD-WAN

SD-WAN networks often consist of:

  • Multiple underlays (fibre, LTE, wireless)

  • Distributed branches with no IT personnel

  • Dynamic tunnels and real-time failover

This complexity makes the traditional “two box” HA design prone to confusion without a third judge. Fusion’s SD-WAN architecture avoids this by:

✅ Using the orchestrator to coordinate HA roles
✅ Monitoring WAN and LAN health metrics
✅ Providing context-aware failover across all underlays
✅ Maintaining control even when direct links between nodes fail


🔍 The Arbitrator Detects More Than Just Failures

A third arbitrator doesn’t only help during total link failure. It also:

  • 🐦 Detects brownouts (e.g., high latency, packet loss) before they escalate

  • ⚠️ Warns of degraded underlay conditions using metrics like jitter and MOS

  • 🔁 Guides graceful failover before users feel the pain

  • 🔒 Maintains state consistency during the transition

Brownouts are the canaries in the coalmine—early warning signs of potential outages. With proper arbitration and monitoring, SD-WAN can respond to brownouts in real time, proactively switching underlays or adjusting traffic policies.


🔧 Legacy Solutions Don’t Measure Up

Legacy failover methods—whether VRRP, HSRP, or manual routing changes—rely on direct connectivity or periodic pings. They don't handle:

  • Partial failures (e.g., congestion, misconfigured firewalls)

  • Indirect outages (e.g., DNS resolution issues, conntrack table exhaustion)

  • LAN issues or Wi-Fi degradation

  • External threats or cybersecurity breaches

Without a third arbitrator, failover is reactive, slow, and prone to error.


💼 Business Impact | Your MSP’s Reputation Is on the Line

In the MSP world, your customer expects you to detect the outage before they do.

If you rely on legacy networking, you’ll often be caught off guard. You won’t see the packet loss, jitter, or DNS delays until the customer phones in angry. Worse, you may misdiagnose the problem or waste time chasing false leads.

With Fusion’s SD-WAN:

  • 🌐 The orchestrator acts as referee, judge, and coordinator

  • 📊 Metrics are always available to prove SLA compliance

  • 🔍 You gain real-time diagnostics and root cause analysis

  • 📉 You reduce support tickets, truck rolls, and troubleshooting time


🧠 Wrap

If you think two is enough, think again.
Without a third arbitrator, high availability becomes high risk.

Fusion SD-WAN builds this third brain into the heart of its architecture, ensuring:

  • Resilient operations

  • Predictable failover

  • Proactive detection of brownouts and instability

  • Crystal-clear visibility into what’s going wrong and why

In a world where milliseconds matter and expectations are sky-high, a two-device HA model is obsolete. It’s time to make the smart move.

👁️ Get eyes everywhere.
⚖️ Get arbitration built in.
🔐 Get SD-WAN with true integrity.
🚀 Get Fusion.

20
Subscribe to my newsletter

Read articles from Ronald Bartels directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ronald Bartels
Ronald Bartels

Driving SD-WAN Adoption in South Africa