Docker Series – Part 17: High Availability with Docker Swarm — Fault Tolerance, Load Balancing, and Scaling

Nitin DhimanNitin Dhiman
3 min read

In our previous article, we set up a Docker Swarm cluster on AWS EC2 with a master-slave architecture. Now, let’s push things further into the real-world

What happens when:

  • A container crashes?

  • A node goes down?

  • Traffic suddenly spikes?

That’s where Swarm's built-in high availability, auto-healing, load balancing, and scaling come into play.

Concepts Covered

  • Multi-tier container architecture and its risks

  • Docker Swarm task replication & fault tolerance

  • Service creation & built-in load balancing

  • Exposing services with --publish

  • Verifying failover through manual deletion

  • Horizontal scaling with docker service scale


Why a Single Container Isn’t Enough

Imagine you’re running a WordPress + MySQL setup on just one container. If that container crashes, your entire application goes down. That’s a single point of failure — and it’s dangerous.

Docker Swarm solves this by:

  • Replicating services across nodes

  • Automatically restarting containers on failure

  • Providing a built-in load balancer to route traffic intelligently


Swarm Recap: Our Cluster Setup

We continued from our previous session with a working Swarm cluster of three nodes.

docker node ls

Each node was active, with the master node holding the leader role.


Creating a Service in Swarm

Let’s create a simple Apache web server service:

docker service create --name myweb httpd

Check its status:

docker service ls
docker service ps myweb

You’ll notice that one container runs on one of the nodes. Now let’s test Swarm’s fault tolerance.


Auto-Healing in Action

Let’s manually delete the container running the service:

docker rm -f <container_id>

Now watch closely. Run:

docker service ps myweb

The old container will show Shutdown Failed, and a new one gets auto-launched on another node. That’s Swarm’s self-healing magic!


Exposing the Service Publicly

By default, Swarm services are private. To expose them to the outside world:

docker service create --name myweb --publish 8080:80 httpd

Now access it via:

curl http://<public-ip>:8080
# Output: <html><body><h1>It works!</h1></body></html>

Even from your browser:
http://<public-ip>:8080 ➝ It works!


Load Balancing at Work

Swarm automatically creates a load balancer for each service.

It handles:

  • Routing requests from clients

  • Distributing them to containers (tasks)

  • Managing health checks and rerouting if needed


Custom Image Deployment

You can use your own Docker image too:

docker service create --name myweb --publish 8080:80 vimal13/apache-webserver-php

This will deploy a PHP-based Apache web server ready to serve from multiple nodes.


Scaling Horizontally

Let’s scale our service to 5 replicas:

docker service scale myweb=5

You’ll see multiple containers running across available nodes — all managed by Swarm:

docker service ps myweb

Each replica will respond to requests in round-robin fashion via the load balancer.

Want more traffic handling? Just do:

docker service scale myweb=10

And BOOM! Your infrastructure just doubled — in a single command.


Scaling In (Removing Instances)

If you want to scale back:

docker service scale myweb=4

Swarm will remove extra replicas gracefully, without downtime.


Final Notes

FeaturePurpose
Swarm ServicesManage & scale containers across nodes
--publishExpose container ports to the world
Auto-HealingAutomatically recover failed containers
Load Balancer (built-in)Distributes traffic across replicas
docker service scaleHorizontally scale up/down with one command

Why This Matters

In production environments, resilience, auto-recovery, and scaling are non-negotiable. Docker Swarm gives you this power natively — without third-party tools.

Whether you're:

  • Running microservices

  • Handling real-world traffic

  • Hosting enterprise workloads

Swarm is your entry point to resilient infrastructure.

Got thoughts, doubts, or questions about Docker Swarm services, scaling, or auto-healing?
Let’s talk in the comments — I’m always happy to support fellow DevOps learners!

0
Subscribe to my newsletter

Read articles from Nitin Dhiman directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nitin Dhiman
Nitin Dhiman

Self-taught DevOps enthusiast on a journey from beginner to pro. Passionate about demystifying complex tools like Docker, AWS, CI/CD & Kubernetes into clear, actionable insights. Fueled by curiosity, driven by hands-on learning, and committed to sharing the journey. Always building, always growing 🚀