CAP Theorem and its limitation

Distributed data systems are the reality of modern applications. Gone are the days when a single database server instance would be enough to serve all the needs. The proliferation of internet across the world with billions of users make applications inherently data intensive. In such a world, any meaningful application would intend its database to scale out horizontally instead of scaling up vertically. This distribution of nodes across network would not in anyway mean that one would want to compromise on authenticity and consistency of data. Ideally, database will always be expected to be available and consistent even if it is distributed across a network which, anyone likes or not, would occasionally faulter.

"Ideally" is the operative word here. It has been observed that a database can not achieve all these desired features simultaneously. This is where CAP theorem, propounded by Gilbert and Lynch, comes into picture to help us understand the nuances of distributed database system.

What is CAP ?

CAP stands for Consistency, Availability and Partition Tolerance. According to this theorem, a distributed data store can simultaneously provide only two out of the above three guarantees. Before we understand this theorem lets first understand the individual terms. We will use the following distributed data system to illustrate.

Here database has 2 instances A and B connected over network both initialized with value x = 0. Client, say C, can make request to either of the two instances.

Consistency

Gilbert and Lynch's definition of consistency.

Any read operation that begins after a write operation completes, must return that value, or the result of a later write operation.

So, in a consistent system every read, receives the most recent write. In other words, all nodes in the system reflect the same data at the same time. Whenever a user reads data from the system, they should see the latest updated value, regardless of which node they are accessing.

As shown in this illustration, here is how the transaction plays out

Client C makes a write operation to instance A and updates value of x to 1
Instance A forwards the value to instance B and updates the value of x there as well.
Client C now makes a read request to instance B and reads value x = 1.

This database is consistent. Read from all the instances/nodes is the latest write value.

Availability

Gilbert and Lynch's definition of availability.

Every request received by a non-failing node in the system must result in a response.

Thus if a node of database is not crashed or down for some reason it must return client's response. Database cannot deny the response to client when it is in healthy state.

In this above illustration both the instances A and B are available to client C.

Partition Tolerance

Gilbert and Lynch's definition of partition.

The network will be allowed to lose arbitrarily many messages sent from one node to another.

So two nodes connected over network might drop some messages and system should be fine with that. Effectively we have partition tolerant database. Note that partition here means network partition which means fault in network connecting the nodes of database. A completely broken/partitioned network would look like this where all the messages will be dropped.

Putting it all together!

Now that we have definitions of all the three clear, we will understand how only two of the three are feasible in a database. Lets look at the trade-offs in CAP theorem.

Consistency and Partition Tolerance (CP) without Availability

If we prefer consistency we will have to compromise on avaialbility in partition tolerant system(messages allowed to drop). This is because, if we consider the above illustration, and lets say the forwarded message x=1 is dropped then instance B will not be updated. Now if we want to ensure that our system is always consistent no matter what, we will have to make instance B unavailable till the network is restored. So any request bieng made to instance B will be declined. Instance A will continue to serve updated value of x and thus making the system consistent.

Example: Distributed databases like HBase, Redis and MongoDB configured for strong consistency. These are used where data accuracy is very critical.

Availability and Partition Tolerance (AP) without Consistency

On the other hand if we prioritise availability over consistency in the same illustration we can allow instance B to continue to serve read requests and thus make all the instances available. This however will come at a cost of consistency as read request served from instance B will continue to return stale value.

Example: NoSQL databases like Cassandra and DynamoDB that prioritize availability and partition tolerance. These are preferred where it is a large data distributed system and need to serve huge number of users.

Consistency and Availability (CA) without Partition Tolerance

And finally, if we assume that our system is not partition tolerant, which in effect means all messages are received by all the nodes, and thus all nodes have updated and latest write values, then we make our system consistent as well as available as all nodes can continue to serve requests as long as they are up and running.

Example: Traditional relational databases like MySQL when not used in a distributed manner.

CAP theorem limitation and redefined

CAP theorem is pretty good in explaining the tradeoffs between the three desired properties of database in theory. However it suffers from limitation.

Practically speaking there will always be fault in network between the nodes of database as we have moved to distributed data stores. As a result we will always drop some messages once in a while. What that means is essentially we will have two phases in a running distributed system.

Phase 1 : Network operating properly and thus no messages dropped. In this phase we will have a database which is both consistent as well as available.
Phase 2 : Faulty network(Partitioned) and thus messages being dropped. In this phase, we will have to prioritise either consistency or availability.

So a better way to define and understand CAP theorem would be

In a Partition Tolerant system, we can only ensure either Consistency or Availability but not both.

This definition and perspective makes more sense in the real world applications. And you can effectively design your applications as per your need!

Conclusion

The CAP theorem provides a crucial framework for understanding the limitations and trade-offs in distributed systems. By recognizing that it's impossible to achieve consistency, availability, and partition tolerance simultaneously, system architects can make informed decisions that align with their application's specific needs. Whether prioritizing consistency, availability, or partition tolerance, understanding the implications of the CAP theorem helps in designing robust, reliable, and efficient distributed systems.

If you liked this blog, don't forget to give it a like. Also, follow my blog and subscribe to my newsletter. I share content related to software development and scalable application systems regularly on Twitter as well.

Do reach out to me!!

CAP Theorem and its limitation!

Table of contents