Quorum Reads and Writes Explained: How They Shape CAP Theorem Tradeoff

Well, we all have read about Quorums sometimes in our lives but I guess just like me a few of you also might have not really paid too much attention to them (I hope I am not the only one who skipped on these). Well lets just agree that we did not even if you do understand them for the next few minutes

(you saw what I did there?)

Some terms before we get going

Cluster: Whenever I mention cluster, think of a Postgres DB cluster until and unless specified otherwise.
Primary/Leader: The node that will handle the writes (maybe reads)
Replicas (Followers / Standbys / Secondaries): Nodes that handle reads and replicate writes from the leader.
Synchronizer: In this context, we’ll treat it as the Primary for simplicity.

It’s always the CAP

Well, just like all the other demon spawns in distributed system, the need for Quorum read/writes also comes from the CAP theorem. In pure technical terms, CAP says, that, “in a distributed computer system can only guarantee two out of three desirable properties: Consistency, Availability, and Partition Tolerance“

I can hear you say from the other side of the screen, “yeah I know that, duh!”
But just humor me for a second and get the basic information down. Now usually, when we talk about CAP, we usually will chose between Consistency and Availability as you shouldn’t rely on the fact whether some nodes are up or not.

Setting up the problem

What are we trying to solve here? So let’s say we have a system, spanning across the globe. A write request happens in some region. This request will be served by some nodes in that region (for now let’s just say they do), in that particular region. But here is the catch, you want this data to be available globally in your service and to make it even worse, lets say we are not using DynamoDB or Casandra which you probably know ship quorum based read/writes out of the box. Lets say we use Postgres (even though it gives a lot of Quorum related features they still need to be configured and coupled with external tooling). Then such a scenario how do you actually define whether the system behaves in CP or AP format? Well Quorum based read/writes are integral for that and now we will see how

Quorum Definition

The word Quorum doesn’t really help us here, it basically means a group but what matters is what these groups represent, i.e. Quorum is a group of “nodes” that come together to “agree” whether an operation is valid or not. This operation, can be anything in theory.

Formally, we say that in a cluster of N nodes, Quorum is satisfied, if the sum of the number of read nodes R and the number of write nodes W is greater than the total number of nodes N

R + W > N

This means that some nodes have both the read and write done to them. Again, lets just focus on this fact for now. This mean, that at least 1 node in such a setup will have the latest data.

Coordinating the operations

Now comes the main point on how this actually works in practice. So when you read/write from such a system, the operations doesn’t happen on 1 node. Rather a Quorum of nodes needs to agree on the operation. This means, a read will be done by asking many nodes and similarly a write will be done on many nodes. If all these nodes agree on the completion of the operation, only then, do we reply to the client with a successful response.

For such a coordination to happen and to make sure what data should be served to the client, a coordinator comes into play. Let’s think of a read to understand how this works.

Let’s say, N=5, R=3, W=3, this is a strongly consistent system (we will see later how different values change the characteristic of our system)

Now a client want to read a value, the coordinator will go and ask for data from the nodes. As soon as the coordinator has R number of responses, it knows that it has enough information to return the consistent value.

But what if all read nodes return different data? Well, this is what happens when your system has suffered a partition and needs repair. So in such a situation you coordinator will pick the latest value (based on a timestamp or a vector time or some other metric) and return that and then coordinate a repair operation on the other nodes.

Well if it can identify the latest data based on the timestamp then why do I even need a Quorum? You need a Quorum to speed up the system and to not have a fragile system. If you don’t have a Quorum, then your reads will have to read from all nodes to get the latest data and if even 1 goes down you won’t be able to ensure consistent reads. Similarly you can follow for writes.

I want consistency

Lets say we chose to prefer consistent systems over available systems. This means, that all the reads should reply with the most recent writes, i.e. don’t server stale data. How do we ensure this?

Going back to our definition before, we have N=5, R=3, W=3 as we saw before, any operation, whether read or write will need to be acknowledge by a quorum to be considered “done”. As we have R=3, W=3 that is not a concern as we saw above as there is at least 1 overlapping node in both read and write sets.

But now we start tweaking R and W values, we will see how the system characteristic change (for N = 7)

What we see here is that as the value of R+W goes higher, the system becomes more redundant as more nodes will have overlap in read and writes. But the real insight is the fact that CAP theorem itself is a consequence of Quorums. It gets enacted because Quorums define how much a system is consistent or available. By tweaking the read and write node counts in Quorum, you can decide what your system prioritizes of the two.

Node Failures: How Many Can You Survive?

We already saw how stale nodes can block reads and the coordinator makes sure to pick the latest data in most modern system based on some timestamp based mechanism. One more issue that can happen is node failures, i.e. nodes go down and the availability is less than the desired number that can satisfy a quorum.
For a stable read we say max allowed failures = N - R and for writes max allowed failures = N - W
That means as long as the number of failing nodes is less than those numbers we can individually offer those operations. For scenarios, where we want to ensure both read and write should succeed we say the max allowed failures = N - max(R,W)

Conclusion

CAP theorem that we usually read about does is one of the biggest constraint of the distributed systems. Quorums are how we enforce or tune consistency and availability tradeoffs under CAP. Just by changing the size of your Quorum configuration you can change how your system will behave and how redundant can it be. So next time you see people talking about CAP, remember it makes sense only with Quorums and how they are responsible for ensuring your cat memes have the correct number of like and comments.

What the hell even is a Quorum