The CAP Theorem and MongoDB: Navigating Trade-offs in Distributed Databases
Table of contents
In the world of distributed databases, the CAP theorem, also known as Brewer's theorem, provides a foundational framework for understanding the trade-offs between different system characteristics. Before diving into CAP Theorem we need some knowledge about distributed system. Let's talk about that first.
What is Distributed System?
A distributed system is a collection of independent computers that appear to the users of the system as a single coherent system. These systems are designed to share resources and coordinate their actions by passing messages. Distributed systems are ubiquitous in modern computing, powering everything from large-scale web applications to cloud services and IoT networks.
Key Characteristics of Distributed Systems
Scalability: Distributed systems are designed to handle increasing loads by adding more resources (e.g., servers, storage) to the system. This can be achieved through horizontal scaling (adding more machines) or vertical scaling (upgrading existing machines).
Fault Tolerance: Distributed systems are designed to continue functioning even if some components fail. This is achieved through redundancy, replication, and failover mechanisms.
Concurrency: Distributed systems often need to handle multiple tasks simultaneously. Concurrency control mechanisms ensure that these tasks do not interfere with each other, maintaining data consistency and integrity.
Transparency: Users and applications should interact with the distributed system as if it were a single, coherent system. This transparency hides the complexity of the underlying distributed architecture.
Heterogeneity: Distributed systems often consist of diverse hardware and software components. The system must be able to integrate and manage these heterogeneous elements seamlessly.
Challenges in Distributed Systems
Network Latency: Communication between nodes in a distributed system can introduce delays, which can affect performance and consistency.
Partial Failures: Unlike centralized systems, distributed systems can experience partial failures where some components fail while others continue to operate. Handling these failures gracefully is a significant challenge.
Consistency: Ensuring that all nodes in a distributed system have a consistent view of the data is complex, especially in the presence of network partitions and failures.
Security: Distributed systems are often more vulnerable to security threats due to their distributed nature. Ensuring secure communication and data integrity is crucial.
Coordination: Coordinating the actions of multiple nodes to achieve a common goal can be complex and requires sophisticated algorithms and protocols.
Databases like MongoDB, Cassandra, and CockroachDB are designed to handle distributed data storage and retrieval, providing scalability and fault tolerance.
What is the CAP Theorem?
In distributed systems, you can only have two out of Consistency, Availability, and Partition Tolerance. You have to choose which two matter most to you
What is Consistency , Availability & Partition Tolerance??!!
Consistency: Ensuring that every read returns the most recent write means that the system must synchronize and agree on the latest state of the data across all nodes. This requires coordination and communication between nodes to ensure that all updates are seen by all nodes.
Availability: To be available, a system must respond to every request, even if it cannot guarantee that the data returned is the most recent. This means that the system continues to serve requests without interruption, which often involves accepting reads and writes even when there might be discrepancies between nodes.
Partition Tolerance: A distributed system is subject to network failures or partitions, where some nodes cannot communicate with others. Partition tolerance means the system must continue operating even when parts of it are isolated. However, maintaining this while also ensuring consistency and availability poses significant challenges.
Why Only Two? What's the problem with Three??!!
Here’s why achieving all three properties simultaneously is typically infeasible:
Consistency and Partition Tolerance (CP)
When focusing on Consistency and Partition Tolerance, the system must ensure that all nodes have the same data despite network partitions. This often means that if a partition occurs, the system might sacrifice Availability. For example, during a network partition, if nodes cannot communicate to agree on the latest state of the data, the system may reject requests (both reads and writes) to avoid serving outdated or inconsistent data.
Long Story Short : Network partition happened so one node can't communicate with other but i still need the latest data, but server doesn't have right now. so it will reject the request hence sacrificing the Availability.
Availability and Partition Tolerance (AP)
For Availability and Partition Tolerance, the system must continue to operate and serve requests even if some nodes are isolated. This might lead to temporary inconsistencies if nodes are operating independently of each other. To maintain availability, the system might serve stale or conflicting data because it cannot wait to ensure all nodes have synchronized before responding to requests.
Long Story Short : Network partition happened so one node can't communicate with other but i still need the latest data, As we configured as AP system so it will give me the latest data from that node even all the node is not synchronized and this data you get from the server might be rolled back. Hence sacrificing the Consistency.
Consistency and Availability (CA)
To ensure Consistency and Availability, the system must respond to every request with the most recent data. However, this is typically not feasible in the presence of network partitions. To achieve consistency and availability, the system needs to coordinate all nodes, but network partitions can prevent nodes from communicating, making it impossible to maintain both properties simultaneously.
Long Story Short : If you want consistent data and available at the same time in all nodes. you have to make sure that the do not stop communicating with each other. Because if network partition occurs then your nodes can't be consistent or available. So you need consistency and availability sacrifice the chance of partition [in real life it is not practical by the way.]
MongoDB and the CAP Theorem
MongoDB is a popular NoSQL database designed for high performance and flexibility. To understand how MongoDB aligns with the CAP theorem, let's explore how it balances consistency, availability, and partition tolerance:
Consistency: MongoDB offers strong consistency by default. In a replica set, writes are acknowledged only after they are written to the primary and replicated to the majority of the secondary nodes. This ensures that all reads will see the most recent write once the write is acknowledged.
Availability: MongoDB aims for high availability by using replica sets. If the primary node fails, one of the secondary nodes is automatically promoted to primary, ensuring continuous operation and minimizing downtime.
Partition Tolerance: MongoDB is designed to handle network partitions. During a partition, the system will try to maintain availability and continue operations on the available nodes. However, if a partition occurs and a majority of nodes cannot be reached, MongoDB may become unavailable for writes until the partition is resolved and a majority quorum is re-established.
Given these features, MongoDB generally aligns with the AP (Availability and Partition Tolerance) aspect of the CAP theorem. It provides high availability and can handle network partitions, but it may sacrifice some consistency during partitions. So If Network partition happened so one node can't communicate with other but i still need the latest data, As we configured as AP system so it will give me the latest data from that node even all the node is not synchronized and this data you get from the server might be rolled back. Hence sacrificing the Consistency. Yeah !! I said it before.!!!
But Can I Change it??!!
Yes!!!! you can . In order to do that you need to write the read and write concern. If you want to know about them here is my another blog link
Optimizing for Consistency and Partition Tolerance
// Connect to MongoDB shell
use yourDatabaseName;
// Insert a document with write concern "majority"
db.yourCollectionName.insertOne(
{ key: "value" },
{ writeConcern: { w: "majority" } }
);
// Find the document with read concern "majority"
const result = db.yourCollectionName.findOne(
{ key: "value" },
{ readConcern: { level: "majority" } }
);
printjson(result);
Consistency: Data is written and read from the majority of nodes.
Partition Tolerance: During partitions, operations may be delayed or unavailable if the majority of nodes cannot be reached.
2. Optimizing for Availability and Partition Tolerance
use yourDatabaseName;
// Insert a document with write concern of 1 (acknowledged by the primary)
db.yourCollectionName.insertOne(
{ key: "value" },
{ writeConcern: { w: 1 } }
);
// Find the document with read concern "local"
const result = db.yourCollectionName.findOne(
{ key: "value" },
{ readConcern: { level: "local" } }
);
printjson(result);
Availability: Writes and reads are acknowledged quickly, even if not all nodes are up-to-date.
Partition Tolerance: The system remains operational and responsive, but data might be inconsistent during network partitions.
In MongoDB, achieving CA typically isn't feasible in a distributed setup due to the inherent nature of distributed systems and the need to handle network partitions. Network failures are a part of the operational environment in which distributed databases function. They are not caused by the database itself but are inherent to the complexities of distributed systems and network communication.
Subscribe to my newsletter
Read articles from MD. TANVIR RAHMAN directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by