Leaderless Replication


Single-Leader Replication and Multi-Leader Replication are based on the principle that the client sends a write
request to one node(Leader), and the database system copies that write to the other replicas.
A leader takes care of how writes
should be processed, and followers apply the leader's writes
in the same order.
Some data storage systems take a different approach, that leaves the concepts of a leader and allows any replica to accept direct writes from clients. Some of the earliest replicated data systems are leaderless, but the idea was mostly forgotten during the era of the dominance of relational databases.
Leaderless-Replication gained popularity after Amazon used it for its in-house Dynamo system.
Riak, Cassandra, and Voldemort are open-source data stores with leaderless replication models inspired by Dynamo, so this kind of database is also known as Dynamo-style.
In some leaderless implementations, the client directly sends its writes to several replicas, while in others, a coordinator node does this on behalf of the client.
Writing to the Database when a node is down
Imagine, You have a database with three replicas, and your one replica goes down.
In a leader-based replication, if you want to continue writing, you may need to perform a failover.
In a leaderless configuration, failover does not exist, but in a leaderless configuration, failover does not exist. The client sends the write request to all three replicas and two available nodes.
It is sufficient for two out of three replicas to acknowledge the writes to be successful. The client simply ignore the fact that one of the replicas missed the write.
Now imagine that the unavailable node comes back online, and clients start reading from it. Any writes that happened while the node was down are missing from that node. This, if you read from that node, you may get stale(outdated) values as responses.
To solve that problem, when a client reads from the database, it doesn't just send its request to one replica: read requests are also sent to several nodes in parallel*.* The client may get different responses from different nodes.
Read Repair and anti-entropy
The replication scheme should ensure that eventually all the data is copied to every replica.
Two mechanisms are often used in Dynamo-style datastores:
Read Repair
When a client makes a read from several nodes in parallel, it can detect any stale responses. The client sees that the down replica has a stale value and writes the newer value back to that replica.
Anti-entropy Process
In addition, some data stores have a background process that constantly looks for differences in the data between replicas and copies any missing data from one replica to another.
Not all systems implement both of these; for example, Voldemort currently does not have an anti-entropy process. Note that without an anti-entropy process, values that are rarely read may be missing from some replicas and thus have reduced durability, because read repair is only performed when a value is read by the application.
Quorums for reading and writing
Every successful write is guaranteed to be present on at least two out of the three replicas, which means the replicas are stale.
Thus, if we read from at least two replicas, we can be sure that at least one of the two is up to date.
If the third replica is down, it will continue to return up-to-date value.
If there are n replicas, every write must be confirmed by at least w
nodes, and every read must be verified by at least r
nodes for each read.
As long as, w + r > n
, we expect to get an up-to-date value.
If reads and write obey these r and w values are called quorums reads and writes.
The quorum condition, w + r > n
allows the system to tolerate unavailable nodes as follows:-
If
w<n
, we can still process writes if a node is unavailable.If
r<n
, we can still process reads if a node is unavailable.with
n=3, w=2, r=2
, we can tolerate one unavailable node.with
n=5, w=3, r=3
, we can tolerate one unavailable node.
Limitations of Quorum's consistency
If you have n
replicas, and you chose w and r such that w+r>n
. This is valid until if and only if the nodes that you have written and the set of nodes from which you've read must overlap.
r
and w
are chosen to be the majority because that ensures w+r>n. while still tolerating n/2
node failures.
Sets of nodes used by read and write operations overlap in at least one node.
Quorum assignments are possible which allows some flexibility in the design of distributed algorithms.
You may also set w + r < n
. Read and writes will still be sent to n nodes, but a small number of successful responses is required for the operations to succeed. With a smaller number of w
and r
, it is most likely to read stale values, but because it is more likely that your read didn't include the node with the latest value.
Even with w+r > n
, there are likely to be edge cases where stale values are returned.
If a sloppy Quorum is used, the
w
writes may end up in a different node than ther
reads, so there is no longer a guaranteed overlap between ther
nodes andw
nodes.If two writes occur concurrently, it is not clear which one happened first. The winner is picked based on timestamp and writes can be last due to clock skew.
If a write happens concurrently with a read, the write may be reflected on only some of the replicas. It's undetermined whether the read returns the old or the new value.
If writes succeeded on some replicas but failed on others, and overall succeeded on fewer than
w
replicas, it is not rolled back on the replicas where it succeeded.If a node carrying a new value fails, and its data is restored from a replica carrying an old value, the number of replicas storing the new value may fall below
w
, breaking the quorum condition.
That's it from my side. There are plenty of topics to discuss, such as:
Sloppy Quorums and Hinted Handoff.
Detecting Concurrent Writes.
Merging Concurrently Written Values.
Subscribe to my newsletter
Read articles from Arpan Das directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Arpan Das
Arpan Das
Experienced full-stack developer skilled in Node.js, React.js, and Django, with expertise in building and maintaining web applications. Proficient in using Redis DB and MongoDB for back-end development, and Antd and Material UI for front-end design. Strong knowledge of C++, Python, Rust, and Julia. Well-versed in tools such as Postman, Git, Github, and Linux.