DDIA - Chapter 9 - Consistency and Consensus - Thoughts and notes


Disclaimer: this blog has been written with the help of LLM, please do not hate me
I never imagined that consensus would be such a difficult problem, in a democracy one has consensus right, it is so easy. Even if there are multiple failure points the leader gets elected.
Anyway, this chapter speaks a lot about linearisability, consistency, consensus, how to be fault tolerant during consensus. Thing is that the tradeoff with making the system consistent is performance. There wasn’t much to discover in this chapter, I did find it a bit boring. The basic idea while building a distributed system should be that time is linear. I’d always use epochs.
Linearizability: Core Concepts
Definition
Makes a distributed system appear as a single replica without replication lag
Ensures all clients see the most recent value once it's written
Requires consistent data across all replicas at the same time
Key Characteristics
Strict consistency model
Once a client sees a new value, all other clients must see that same new value
Critical for scenarios like:
Leader election in single-leader systems
Choosing unique usernames
Seat booking systems
Trade-Offs
Performance-intensive
Often sacrificed for availability
Costly in multi-data center environments
Not many systems provide true linearizability
Limitations
High performance overhead
Difficult to implement across distributed systems
Network delays can significantly impact implementation
Distributed Transactions
Types
Database-internal transactions
Used within same database system
Easier to implement
Heterogeneous distributed transactions
Involve multiple, different systems
Focus on ensuring atomic commits
Key Mechanism: Two-Phase Commit (2PC)
Two phases: Prepare and Commit
Transaction coordinator manages process
Ensures all nodes commit or abort together
Challenges
Performance bottlenecks
Complexity of coordinating across systems
Potential for system-wide blocking if coordinator fails
Consensus
Definition
Getting multiple nodes to agree on a value with specific properties:
Uniform agreement
Integrity
Validity
Termination
Key Characteristics
Requires majority of nodes to be functioning
Uses epoch numbering for leader selection
Implements total order broadcast
Limitations
Requires strict majority of nodes
Difficult to add/remove nodes dynamically
Relies on timeout mechanisms
Performance can be impacted by frequent leader elections
Coordination Services (e.g., Zookeeper, etcd)
Features
Linearizable atomic operations
Total ordering of operations
Failure detection
Change notifications
Service discovery
Subscribe to my newsletter
Read articles from Vivek Khatri directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vivek Khatri
Vivek Khatri
I am still deciding what should I write here.