Database Replication Explained: Techniques and Challenges


Database Replication
Introduction
The term "database replication" Most of the time, it is associated with database scaling.
Simply put, it means copying the database and placing it on different machines so we can handle the load resulting from scaling.
Before we dive into replication methods, let’s go over some important terms.
Leader and Follower
Basically, in database replication, we use the term “Leader” for the database instance that the user writes to directly and, in some cases, may reads from. The “Follower” refers to the database instance that the user reads from.
But wait, how do we ensure consistency when reading from the Follower? Well, to achieve the consistency that a single database instance provides, we will face challenges, but we are forced to find a solution to handle the load, remember?
However, the Leader simply writes to the followers. So, it’s basically like this:
And when we talk about communication between two nodes via network, this is where sync and async come in.
Synchronous vs Asynchronous Replication
The replication happens in two ways sync and async, each one has advantage and disadvantage
Lets take a look on the following diagram that showing a user making an update query on a point of time then the leader store the data and send it to two followers, one in a sync way and another for async way (this way is called semi-synchronous), we see that the user gets a feedback before the second follower store the data
So from the diagram We can conclude some advantages and disadvantages for both ways,
Synchronous way
Advantages:
The follower is guaranteed to have an up-to-date copy of the data (Consistency)
If the leader fails for any reason, we can easily replace it with any synchronous follower
Disadvantages:
The user has to wait until all synchronous followers are updated
If a follower fails for any reason, the leader has to wait until the follower becomes available again, which can lead to a completely system failure
Asynchronous way
Advantages:
The user doesn’t have to wait until all asynchronous followers are updated (Availability)
If an asynchronous follower fails, we can simply ignore it, and once it becomes available again, it’s easy to bring it up-to-date (check follower failure)
Disadvantages:
- If the leader fails for any reason, it’s a big problem because there’s no guarantee that the follower replica we promote to leader will have an up-to-date copy of the data
The disadvantages of the synchronous way make it impractical to have all followers synchronized, In practice, enabling synchronous replication in a database typically means one follower is synchronous, while the others are asynchronous. This is sometimes called semi-synchronous.
However a full asynchronous way is often used in “Single-Leader replication” method, this is mean a write is not guaranteed to be durable even if it has been confirmed to the client.
Weakening durability may sound like a bad trade-off, but it’s widely used, especially when there are many followers or they are geographically distributed
Follower Failure
Any node in the system can go down, So we need to ask: what if a Follower down? How do we bring it back?
Actually, each Follower keeps a log of data changes on its local disk. So, if the Follower fails, it can recover easily from its log, then reconnecting to the Leader and requesting the data changes that occurred while it was disconnected, this way called “Catch-up recovery”, Bringing the Follower back sounds easy, but what about the Leader?
Leader Failure
As we said, any node in the system can go down, so let’s think about the leader.
If the leader fails we need one of the follower that is up-to date with the leader(synchronized) to be promoted to be the new leader, so the system needs to reconfigure to send the writes to the new leader and the followers need to start consuming the data changes from the new leader, this process called “Failover”.
So, it happens in three steps:
Determining that the leader has failed
Choosing new leader
Reconfigure the system to use the new leader
But there is some issues here:
What if async replication was used? the new leader may not have received all the writes from the old leader, and what if the old leader return back after a new leader has been chosen? the new leader may have conflicting writes in the meantime, the solution is to discard the old leader’s un-replicated writes. which violates client’s durability expectation
Another issue is happening because of the solution of the first issue :-)
Discarding writes is a bad idea, especially when other storage systems connected to the database need to stay aligned with the database contents.
A real-world example of this happened at GitHub: an out-of-date Follower was promoted to Leader, the database was using auto-incrementing IDs to assign primary keys to the new records.
and because the new Leader’s counter lagged behind the old Leader’s, it reused some primary keys that had already been assigned. These same primary keys were also used in the Redis store, causing inconsistency between MySQL and Redis, which resulted some private data to be disclosed to the wrong users.
There are some other issues, but they’re not covered here. Still, you get the idea :-)
Replication Lag
Replication lag is the period it takes for a write to be fully replicated to all Followers. What can happen during this period?
To make it clearer, let’s look at the diagram below:-
The user inserts the data, which returns a success message, and then tries to read it, but from Follower 2 (which is temporarily outdated). So, The red area represents the period when replication lag may have occurred.
This can cause the data to appear (insert query return with success) and then disappear (read query).
One of the solutions to deal with replication lag, is called “Read-After-Write” Or “Read-your-Writes-Consistency“ which basically make the user read his own writes from the leader but there is no guarantees about the other users.
Based on the diagram above, the following issues may arise due to replication lag:
A user might insert data and immediately try to read it, but fail to find it
A user may read a piece of data once, then read it again later and find it missing (Monotonic reads)
A user could read data in the wrong order, seeing newer data before older data (Consistent prefix reads)
Single-Leader Replication
In this method clients send all writes to a single node (Leader), which sends stream of data change events to the other replicas (Followers).
Reads can be preformed on any replica but might read stale data
Single leader replication would be something like this:
Advantages
All writes go through a single leader, which simplifies consistency guarantees.
Reads are distributed across all followers, which enhances performance under read-heavy workloads
Disadvantages
All writes are handled by the leader, which may cause performance issues if the application is write-intensive
If the leader goes down, writes will not be accepted until a synchronized follower is promoted to leader (not fault-tolerant for writes)
Without a fast failover process, a leader failure can become a single point of failure for the entire system
Scalability issues may occur when adding replicas across different regions, as this increases the load on the leader to write data to followers, leading to higher latency
When to use?
If the system is Read-Intensive with less writes.
When consistency is more important than availability
If the system is not geo-distributed
Multi-Leader Replication
In this method clients send each write to one of several leaders, any of which can accept writes. The leaders send streams of data change events to each other and to any followers nodes.
So it will be something like this:
Advantages
High Availability and Fault Tolerance because if one leader fails there are other leaders can handling writes
No single write bottleneck because the writes is distributed across all the leaders (Write-Scalability)
Lower latency for distributed systems because the user can write to the nearest leader
Disadvantages
Concurrent writes to the same data item across leaders can collide. and Resolving conflicts is complex and requires strategies like Last-Write-Wins , app-specific logic, CRDTs, timestamps, or consensus protocols. (Conflict Resolution is not covered here)
Increasing the system complexity because of conflict detection and resolution mechanisms
different leaders may temporarily have inconsistent data (Eventual Consistent)
When to use?
When availability is more important than consistency
When the system is Write-Intensive
When the system is Geographically Distributed
Leader-less Replication (Dynamo style)
This method is slightly different because there is no follower or leader concepts, So any node can accept reads and writes, and the data is synchronized across replicas without a single leader.
So, how are reads and writes handled in this replication method?
Here is how it going, writes are sent to all nodes in the same time and the reads as well, so how we guarantees strong consistency?
Quorum
Quorum is the way that we depend on it to decide if the write operation is successful or not
The quorum is represented by the following mathematical inequality:
w + r > n
Which:
n
is the total number of nodes
w
Is the minimum nodes that must acknowledge a write to be considered as a successful write operation
r
Is the minimum nodes that must respond to a read to be considered as a successful read operation
So for example:
If we have 5 nodes the quorum may be something like 3+3 > 5
So here we must have 3 successful nodes in write operation and 3 successful nodes in read operation
but what if the reachable nodes is less than w in writes operation and less than r in read operation?
Here is come the sloppy quorum
Sloppy Quorum
In leader-less replication the database designers face a trade-off:
Is it better to return an error to all requests because we can’t reach the quorum for read and write?
Or should we accept writes in any available nodes in write operation and once the other nodes are back available it will take the updates from the up-to-date nodes, and in case of read operation its okay to read a stale data?
The sloppy quorum is the second one and they choose it because the reason of leaderless replication is the high availability so it doesn’t making sense to return an error or make the user waiting till the nodes return available again and this process is called Hinted Handoff
So the sloppy quorum is not as the normal quorum, its for assurance data durability
Advantages
Writes succeed as long as some nodes are available which give the system a very high availability
New nodes can be added seamlessly, making scalability straightforward
Any node failure doesn't block operations which give the system high fault tolerance
Disadvantages
Reads may return stale data until synchronization completes (Eventual consistency)
Conflict resolution is require (since writes are processed by all nodes, concurrent writes can result in conflicts) techniques like LWW ,CRDTs or other
Requires background processes to fix inconsistencies (Read-repair and hinted handoff)
Conclusion
Database replication is a critical technique for scalability to enhance the performance, availability, and fault tolerance of database systems. However, it also introduces challenges such as ensuring consistency, handling replication lag, and managing leader and follower issues. Different replication methods, including single-leader ,multi-leader or leader-less replication and each one of them introduces a trade-offs between consistency, availability and scalability.
Understanding these methods and their trade-offs to be able to choose between them and design a robust and efficient systems that meet specific application requirements.
References
https://www.amazon.eg/-/en/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
https://www.amazon.science/publications/dynamo-amazons-highly-available-key-value-store
Subscribe to my newsletter
Read articles from Ahmed Elsayed directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
