CAP Theorem - A toolkit essential

Have you ever wondered why how the views and likes count on your instagram reel or post takes sometime to update while the app you can still use other features of the app simultaneously ? The answer might lie in he core principle of distributed system design - CAP THEOREM.

Let’s dive deep !!

What is CAP Theorem ?

Porposed by Eric Brewer in 2000, the CAP theorem states that -

In any distributed data system, it is impossible to simultaneously gaurantee all three of the following -

  1. “Consistency (C)”

  2. “Avaialbility (A)”

  3. “Partition Tolerance (P)”

At best, you can only choose two of them. These tradeoffs compel engineers to make conscious architectural design decisions based on non-functional requirements, early on the design journey.

Looking at each of the three pillars in detail …..

Consistency (C)

This implies that every read request must return the most recent data or an error.

  • Imagine you are booking the last show ticket for your favourite singer’s concert. If the system is consistent, no one else will be able to see that seat as available once you reserve it.

Availability (A)

This implies that every read request receives a response ( non-error ) irrespective of the response being the most recent write.

  • Even when parts of the system are down, the system cotinues to respond, perhaps with slightly outdated data

Partition Tolerance (P)

A partition is said to be created when two or more nodes ( replicas ) can’t communicatae with each other and operate in isolation as communication fails due to network error.

Partition tolerance implies that even when network fails, the system should be able to perform reasonably.

Why can’t we have all three ?

In a distributed system, partitions are inevitable.

Servers are located across continents and network failures between two data centres is very commom. In this case we have two options -

  1. Keep serving requests across continents from the partitioned node with stale data to be always available ( hurting consistency).

  2. Stop serving requests until data across all nodes is updated to the most recent value to maintain strong consistency ( hurting avaialability).

So in practice, a distributed system is always partition tolerant and hence, we are left to choose one of the three.

Some examples of what to choose when -

ExampleWhatWhy
Banking SystemCPPrioritise consistency over availability as they can afford to show stale statistics like account balance, interest rates etc.
Messaging AppsAPPrioritise availability over consistency as it’s better to show older messagess than the whole app being unavailable.
Tickets rservation system like flight booking, train bookingCPPrioritise consistency over availability as they can’t affors the hassle behind a single seat being booked multiple times by multiples users.
Streaming services like NetflixAPPrioritise availability over consistency as they can use older content to serve people until newer content is being uploaded.

Key Takeaways

  • You must handle partitions in distributed systems — so P is a must.

  • This means in practice, you're always choosing between C and A.

  • Choose based on what failure mode your system can tolerate.

We have another theorem PACELC which states that a partition tolerant system design can either be avaialable or consistent and a non partition tolerant system design can either pe consistent or have low latency.

0
Subscribe to my newsletter

Read articles from Prabhsimran Bajaj directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prabhsimran Bajaj
Prabhsimran Bajaj