Safety and Liveness in Distributed system
Safety: Something that must never happen in a correct system.
Liveness: Something that must eventually happen in a correct system..
We will use a banking system as an example in this article.
Safety: We can't let users withdraw or transfer more money than they have.
Liveness: If users transfer their money to another valid account, the transfer should eventually be successful. This could happen immediately, in 5 minutes, the next day, or even next week. The timing doesn't matter, but it should happen eventually.
In real life and in general, when making decisions, we often favor safety over liveness: Safety > Liveness.
It becomes clear that there is a natural conflict between safety and liveness properties. In distributed systems, some problems make it physically impossible to satisfy both types of properties. Therefore, we need to compromise on some liveness properties to maintain safety.
Banking systems choose Safety over Liveness; for money-related matters, they prioritize correctness above everything else. That is why it is not uncommon to see delays of several days when you transfer money from one account to another.
Like a bank, every distributed system algorithm needs to balance safety and liveness. Distributed system algorithms are challenging, and it's easy to make wrong assumptions because when reading an algorithm's definition, we often focus only on the "happy path."
An example of safety and liveness is the 2PC (Two-Phase Commit) and 3PC (Three-Phase Commit) protocols; they are not perfect.
Property | Two-Phase Commit (2PC) | Three-Phase Commit (3PC) |
Safety | Yes | No (Unlikely to happen but No still is No)! |
Liveness | No | Yes |
Description | 2PC ensures that all participants either commit or abort a transaction. However, it can block indefinitely if a participant fails during the commit phase. So it violates the Liveness property. | 3PC aims to solve 2PC’s blocking issue by splitting the commit phase into two steps: pre-commit and final commit. It is non-blocking even if a participant fails. However, 3PC does not ensure Safety in all corner cases! |
Trade-off | Prioritizes safety over liveness. | Prioritizes liveness over safety. |
Subscribe to my newsletter
Read articles from hai nguyen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by