Quick POC for "leader election" using ZooKeeper

Hariom YadavHariom Yadav
4 min read

What Problem Does Leader Election Solve?

In distributed systems, many nodes (servers) work together to perform tasks. However, some tasks require coordination by a single node to avoid conflicts or inconsistencies. For example:

  • Database Writes : Only one node should handle writes to prevent data corruption.

  • Task Scheduling : One node should assign tasks to avoid duplicate work.

  • Cluster Coordination : A single node should manage cluster-wide decisions like failover or rebalancing.

Without a mechanism to elect a leader, these systems can face chaos :

  • Multiple nodes might try to perform the same task simultaneously.

  • Data could become inconsistent or corrupted.

  • The system might deadlock or behave unpredictably.

Leader election ensures that one node is always in charge , even if nodes fail or network issues occur.

Solution with ZooKeeper

ZooKeeper ensures a smooth transition by electing a new leader:

  1. Before Failure :

    • Node A creates an ephemeral sequential node in ZooKeeper: /election/leader_0000000000.

    • Nodes B and C create their own nodes: /election/leader_0000000001 and /election/leader_0000000002.

    • Node A is the leader because it has the smallest sequence number.

  2. Node A Crashes :

    • Node A’s ephemeral node (leader_0000000000) is automatically deleted.

    • Node B detects this via a watch and checks if it’s now the smallest node (leader_0000000001).

  3. New Leader Elected :

    • Node B becomes the new primary and starts handling write operations.

    • Node C sets a watch on Node B in case it fails.

Outcome

  • The system remains consistent and operational, even after the failure of the primary node.

  • There’s no split-brain or ambiguity about who’s in charge.


Example Real-World Scenario: Database Failover

Imagine you’re running a distributed database with three nodes: Node A, Node B, and Node C. Node A is the primary node responsible for handling all write operations, while Nodes B and C act as backups (replicas).

Problem

What happens if Node A crashes? Without a leader election mechanism:

  • Nodes B and C might both try to become the new primary.

  • This could lead to split-brain , where two nodes think they’re in charge, causing data inconsistency or corruption.


Step-by-step POC for leader election using ZooKeeper

Simulate 3 nodes competing to become the leader and observe how ZooKeeper handles failover.

Step 1: Start ZooKeeper

brew install zookeeper
zkServer start

Step 2: Simulate 3 Nodes Competing for Leadership

Open 3 terminal windows to represent 3 nodes.

Terminal 1 (Node 1):

zkCli.sh
create /election "election"  # Create parent node (persistent)
create -e -s /election/leader_ "node1"  # Create ephemeral + sequential node

Output: Created /election/leader_0000000000
  • A session is created for node1.

  • The zkCli.sh client automatically sends heartbeats to ZooKeeper.

Terminal 2 (Node 2):

zkCli.sh
create -e -s /election/leader_ "node2"

Output: Created /election/leader_0000000001
  • If a node isn’t the leader, it sets a watch on the node immediately before it in the sequence

    • Node 2 (leader_0000000001) watches leader_0000000000.

Terminal 3 (Node 3):

zkCli.sh
create -e -s /election/leader_ "node3"

Output: Created /election/leader_0000000002
  • If a node isn’t the leader, it sets a watch on the node immediately before it in the sequence

    • Node 3 (leader_0000000002) watches leader_0000000001.

Step 3: Determine the Leader

  1. List all leader nodes: In any terminal:

     ls /election
    
     Output: [leader_0000000000, leader_0000000001, leader_0000000002]
    
  2. Leader Logic:

    • The node with the smallest sequence number (leader_0000000000) becomes the leader.


Step 4: Simulate Leader Failure

Kill Node 1 (Terminal 1):

  1. Press Ctrl+C to terminate the ZooKeeper CLI session.

  2. The ephemeral node /election/leader_0000000000 is deleted.


Step 5: Observe New Leader Election

  1. List remaining nodes:

     ls /election
    
     Output: [leader_0000000001, leader_0000000002]
    
  2. New Leader:

    • The next smallest node (leader_0000000001) becomes the new leader.


Concepts

  1. Data is stored in a tree-like structure (like a filesystem)

  2. Persistent Nodes : Survive even if the client disconnects.

  3. Ephemeral Nodes : Automatically deleted when the client session ends (e.g., a service crashing).

  4. Sequential Nodes : Automatically appended with a monotonically increasing counter (e.g., /leader_0000000001).

  5. Watches to notify waiting clients. If the leader’s node is deleted (due to failure), the next node in line takes over.

Key Benefits of ZooKeeper’s Approach

  • Automatic Failover : If the leader dies, the next node in line takes over seamlessly.

  • Deterministic Order : Sequence numbers ensure there’s no ambiguity about who should lead.

  • Fault Tolerance : Ephemeral nodes ensure failed nodes are automatically removed.


  • NOTE:

    • ZooKeeper provides the building blocks (sequential nodes, watches, and atomic operations).

    • Your application (or a client library) must implement the logic to:

      • Sort nodes to find the smallest sequence number.

      • Watch the predecessor node for changes.

ComponentHandled by ZooKeeperHandled by Application/Library
Node creation (sequential)
Ordering guarantee
Finding the smallest node✅ (via sorting)
Watching predecessor node✅ (via watches)

This POC mirrors how systems like Kubernetes or distributed databases elect leaders for high availability.🚀


0
Subscribe to my newsletter

Read articles from Hariom Yadav directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Hariom Yadav
Hariom Yadav

🏞️ Love photography, Love spring, Love Simplicity, Love Meditation 😌 ✌️