Know the Essentials of Raft

Table of contents

Recently, I was setting up a Vault server to manage secrets. As I was reading their documentation, I got to know that their storage backend uses a consensus algorithm called Raft. Once I set up the server, I spent some time learning about Raft. The following are my simplified notes on Raft, which I hope helps you understand this simple yet powerful mechanism in a short amount of time. Enjoy!
Background
Consensus algorithms typically arise in the context of replicated state machines, a general approach to building fault-tolerant systems. But why is consensus required? Coherence. Put simply, to make a decision that affects each entity involved in a system, it is important to take their individual choices into account before making a final decision. Consensus algorithms are used in clock synchronization, load balancing, data replication, transaction validation, are more.
Motivation
Paxos, the dominant consensus algorithm, is quite difficult to understand. The primary goal of designing Raft was understandability. The authors, Diego Ongaro and John Ousterhout, applied techniques such as decomposition and state space reduction to achieve this goal.
Novelty
Strong leadership.
Leader election based on randomized timers.
Membership changes based on joint consensus.
How it works
Node States
Follower: Default state for all nodes.
Candidate: If a
Follower
does not hear from aLeader
, it becomes aCandidate
.Leader: A
Candidate
becomes aLeader
with votes from a majority of nodes.
Leader Election
Election Timeout: Amount of time a follower waits until becoming a candidate. This is randomized between
150-300 ms
.After a follower’s election timeout, the node becomes a
Candidate
and starts a new election term, votes for itself, and sends out Request Vote messages to other nodes.If receiving nodes have not voted yet in that term, they will vote for the candidate that sent the Request Vote message, whose election timer will reset.
Once a candidate has a majority of votes, it becomes the
Leader
and sends out Append Entries messages to its followers.Each election term continues until a follower stops receiving heartbeats and becomes a
Candidate
.When two or more nodes’ election timeout simultaneously, a split vote can occur. In this case, a majority of votes will not be attained and all nodes (followers) must wait for the next election term.
Heartbeat Timeout: Amount of time a
Leader
waits before sending Append Entries messages to its followers. This is randomized between150-300 ms
.
Log Replication
A leader sends Append Entries messages to its followers to communicate state changes and commits them once a majority of followers acknowledge the change.
In case of partitions between nodes, there can be a newly elected leader who attempts to communicate changes based on client requests but these changes will not be committed due to a majority not being present. Once the partition is rectified, the original leader is recognized and can communicate accurate changes to bring all nodes up to speed. This involves rolling back uncommitted logs on the nodes that were out of sync due to the partition.
References
Subscribe to my newsletter
Read articles from Ashwin Venkatakrishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
