High Availability in Docker Swarm

When we talk about container orchestration in production, high availability is not a “nice to have” — it’s a necessity.

Imagine you have a single manager node in your Docker Swarm cluster, and it goes down. Without redundancy, your orchestration layer fails, applications lose coordination, and scaling decisions halt. That’s where multiple manager nodes come into play.

In this article, we’ll explore:

Why high availability matters in Docker Swarm
Adding multiple managers to a cluster
Leader election and fault tolerance
Node promotion/demotion
Maintenance strategies with drain mode

Understanding Manager & Worker Nodes

In a Docker Swarm cluster:

Manager Nodes: Handle orchestration — scheduling containers, scaling services, maintaining cluster state.
Worker Nodes: Run the actual application containers.

By default, all management actions (like scaling) must be performed on a manager node. Clients can connect to any worker node directly to access deployed services.

The risk? Single Point of Failure (SPOF).
If the only manager node goes down, the cluster loses its orchestration ability.

High Availability & Fault Tolerance

To avoid SPOF, we add more manager nodes.
Example:

With 3 managers, if 1 fails, the cluster can still function.
This resilience is called Fault Tolerance.

The formula:

Fault Tolerance = (N - 1) / 2

Where N is the number of managers.

Adding more managers also enables leader election using the Raft Consensus Algorithm — ensuring that at any time, one node is the leader and others are reachable managers.

Setting up Multiple Managers on AWS EC2

Step 1: Launch New Instances

We launch 2 additional Amazon Linux EC2 instances to join as managers.

Step 2: Install Docker on New Nodes

yum install docker -y
systemctl enable docker --now

Step 3: Generate Manager Join Token

On the leader node:

docker swarm join-token manager

Step 4: Join New Managers

Run the token command on the new instances:

docker swarm join --token <manager-token> <leader-ip>:2377

Now, listing nodes:

docker node ls

Shows multiple managers, with one as Leader and others as Reachable.

Leader Election in Action

When the leader node is stopped:

systemctl stop docker

Another reachable manager becomes the leader automatically.
This is how Docker Swarm ensures continuity without downtime.

Scaling Services

We can scale services either:

One at a time:

docker service update --replicas=5 <service-name>

Multiple at once:

docker service scale service1=3 service2=5

Promoting & Demoting Nodes

You can promote a worker to a manager:

docker node promote <worker-hostname>

Or demote a manager to a worker:

docker node demote <manager-hostname>

Maintenance with `drain` Mode

If you need to perform maintenance on a manager:

docker node update --availability drain <hostname>

This stops any containers running on that node and reschedules them to other nodes — keeping your services up without interruption.

Key Takeaways

Multiple managers = fault tolerance + high availability.
Leader election ensures cluster orchestration continuity.
Raft protocol maintains synchronized cluster state.
Use drain mode for safe node maintenance.
Promote/demote nodes as needed for flexible scaling.

Next in the series: We’ll explore Overlay Networks & Stacks in Swarm to enable multi-host container communication and secure deployments.

Docker Series – Part 19: High Availability in Docker Swarm – Multiple Managers, Fault Tolerance & Node Management

Table of contents

Understanding Manager & Worker Nodes

High Availability & Fault Tolerance

Setting up Multiple Managers on AWS EC2

Step 1: Launch New Instances

Step 2: Install Docker on New Nodes

Step 3: Generate Manager Join Token

Step 4: Join New Managers

Leader Election in Action

Scaling Services

Promoting & Demoting Nodes

Maintenance with `drain` Mode

Key Takeaways

Subscribe to my newsletter

Nitin Dhiman

Nitin Dhiman

Docker Series – Part 19: High Availability in Docker Swarm – Multiple Managers, Fault Tolerance & Node Management

Table of contents

Understanding Manager & Worker Nodes

High Availability & Fault Tolerance

Setting up Multiple Managers on AWS EC2

Step 1: Launch New Instances

Step 2: Install Docker on New Nodes

Step 3: Generate Manager Join Token

Step 4: Join New Managers

Leader Election in Action

Scaling Services

Promoting & Demoting Nodes

Maintenance with drain Mode

Key Takeaways

Subscribe to my newsletter

Nitin Dhiman

Nitin Dhiman

Maintenance with `drain` Mode