Know the Essentials of Raft

Recently, I was setting up a Vault server to manage secrets. As I was reading their documentation, I got to know that their storage backend uses a consensus algorithm called Raft. Once I set up the server, I spent some time learning about Raft. The following are my simplified notes on Raft, which I hope helps you understand this simple yet powerful mechanism in a short amount of time. Enjoy!

Background

Consensus algorithms typically arise in the context of replicated state machines, a general approach to building fault-tolerant systems. But why is consensus required? Coherence. Put simply, to make a decision that affects each entity involved in a system, it is important to take their individual choices into account before making a final decision. Consensus algorithms are used in clock synchronization, load balancing, data replication, transaction validation, are more.

Motivation

Paxos, the dominant consensus algorithm, is quite difficult to understand. The primary goal of designing Raft was understandability. The authors, Diego Ongaro and John Ousterhout, applied techniques such as decomposition and state space reduction to achieve this goal.

Novelty

  1. Strong leadership.

  2. Leader election based on randomized timers.

  3. Membership changes based on joint consensus.

How it works

  1. Node States

    1. Follower: Default state for all nodes.

    2. Candidate: If a Follower does not hear from a Leader, it becomes a Candidate.

    3. Leader: A Candidate becomes a Leader with votes from a majority of nodes.

  2. Leader Election

    1. Election Timeout: Amount of time a follower waits until becoming a candidate. This is randomized between 150-300 ms.

      1. After a follower’s election timeout, the node becomes a Candidate and starts a new election term, votes for itself, and sends out Request Vote messages to other nodes.

      2. If receiving nodes have not voted yet in that term, they will vote for the candidate that sent the Request Vote message, whose election timer will reset.

      3. Once a candidate has a majority of votes, it becomes the Leader and sends out Append Entries messages to its followers.

      4. Each election term continues until a follower stops receiving heartbeats and becomes a Candidate.

      5. When two or more nodes’ election timeout simultaneously, a split vote can occur. In this case, a majority of votes will not be attained and all nodes (followers) must wait for the next election term.

    2. Heartbeat Timeout: Amount of time a Leader waits before sending Append Entries messages to its followers. This is randomized between 150-300 ms.

  3. Log Replication

    1. A leader sends Append Entries messages to its followers to communicate state changes and commits them once a majority of followers acknowledge the change.

    2. In case of partitions between nodes, there can be a newly elected leader who attempts to communicate changes based on client requests but these changes will not be committed due to a majority not being present. Once the partition is rectified, the original leader is recognized and can communicate accurate changes to bring all nodes up to speed. This involves rolling back uncommitted logs on the nodes that were out of sync due to the partition.

References

  1. https://raft.github.io/ (Paper)

  2. https://thesecretlivesofdata.com/raft/

  3. https://www.baeldung.com/cs/consensus-algorithms-distributed-systems

  4. https://en.wikipedia.org/wiki/Consensus_(computer_science)

0
Subscribe to my newsletter

Read articles from Ashwin Venkatakrishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ashwin Venkatakrishnan
Ashwin Venkatakrishnan