Step by step, dissecting the most challenging parts of the database transaction — Isolation
When it comes to database transactions, most people's first reaction is often ACID. However, the importance and complexity of the four attributes are not equivalent. The most challenging aspect to comprehend is Isolation (I). One primary reason for this difficulty in understanding stems from the historical entanglement of definitions and actual implementations of various isolation levels. Additionally, different vendors frequently employ confusing terminology, further contributing to the confusion. This article attempts to clarify several common isolation levels by using locks, providing a relatively imprecise narrative to help you establish an intuitive understanding.
Background
Databases aim to offer users isolation guarantees through transaction isolation, which helps reduce the complexity of programming on the application side. The highest level of isolation, serializability, provides users with a strong guarantee: at any given point in time, it can be assumed that only one transaction is actively running.
The challenge of comprehending the progressive relationship among various isolation levels when starting to learn often arises from not finding the right perspective. I want to emphasize again that:
"If you approach things in the right way, nothing should be hard to understand."
Drawing from my work experience, dissecting these isolation levels from an implementation perspective can assist us in forming a consistent understanding.
The four isolation levels defined by ANSI SQL, from weakest to strongest, are: Read Uncommitted, Read Committed, Repeatable Read, and Serializability. You can grasp their progressive relationship by examining how transactions are implemented using locks.
Locks
The strongest level of isolation — Serializability, can be understood as a global mutex lock. In this case, each transaction obtains the lock when it begins and releases it when it finishes (either through commit or rollback). Clearly, this level of isolation has the worst performance.
Conversely, the weaker isolation levels can be understood as measures to enhance performance. These measures include narrowing the scope of locking (from predicate lock to object lock), reducing the duration of locking (released after transaction completion to released immediately after use), and lowering the intensity of locking (from exclusive lock to shared lock). These adjustments are made at the expense of consistency to boost performance.
That is to say, when considering locking strength, we have Mutex Lock (also referred to as Write Lock or Exclusive Lock) and Shared Lock (also known as Read Lock). In terms of locking duration, we have Long Period Lock (acquired at the start of a transaction and released at the end) and Short Period Lock (acquired when accessing data and released immediately after use). Regarding the granularity of locking, we have Object Lock (analogous to Row Lock in relational databases, which locks a single row) and Predicate Lock (which locks a subset of data).
KV Data Model
Databases are often structured at the storage layer using a Key-Value (KV) model. For instance, pages in B+ trees or blocks LSM-Trees are both composed of a collection of KV entries. In the realm of relational databases, when data is stored by rows, a single KV's Key typically corresponds to the primary key, and the Value usually represents a single row of data. Consequently, for the remainder of this discussion, altering data in a transaction can be regarded as:
Individual object: A single KV pair.
A group of objects: For instance, expressions like "where x > 5 and y < 6" will determine a set of KV entries.
Therefore, when discussing the data granularity below, it's all based on the KV model.
Weak Isolations
The best-performing isolation level involves not using any locks and is often referred to as the "No Isolation" or "Read Uncommitted" isolation level. At this level, a transaction will directly access data that is currently being modified by other transactions. This approach offers the advantage of exceptionally high concurrency and low latency because transactions don't block each other. However, it also introduces some potential issues:
Dirty Read:
initial: x=4
transaction 1:--W(x=5)--->rollback
transaction 2:----->R(x=5)--->commit
// Transaction 2 has observed the value x=5, which should not have existed.
// This is because transaction 2 has read the uncommitted value.
Dirty Write:
initial: x=4, y=4
transaction 1:--W(x=5)--------W(y=5)--->commit
transaction 2:----W(x=6)--W(y=6)->commit
// The result is x=6, y=5。
// But the correct result should be that both x and y are either 5 or 6.
// This is because the uncommitted modification in transaction 1
// has been overridden by transaction 2.
To prevent dirty writes, you can employ long-duration write locks on objects you wish to modify. But still, you will read data without locking. This isolation level is referred to as "Read Uncommitted" (RU). Nevertheless, even in this setup, there can still be occurrences of dirty reads:
Dirty Read:
initial x = 4
transaction 1:----WL(x),W(x=5)-->rollback, UL(x)
transaction 2:---------------->R(x=5)--->commit
Note:RL:ReadLock; WL:WriteLock;UL:Unlock
To prevent dirty reads, you can apply short-duration read locks to objects in the read set. Then isolation level becomes "Read Committed" (RC). Nevertheless, because of the use of short-duration read locks, a transaction that reads an object 'x' twice with an interval in between may face inconsistency if another transaction modifies 'x' and commits during the gap. This problem is commonly known as a "non-repeatable read".
Non Repeatable Read:
initial x = 4
transaction 1: -RL(x),R(x=4),UL(x)-------------------------->RL(x),R(x=5),UL(x)--->commit
transaction 2:-------------------WL(x),W(x=5)-->commit,UL(x)
To address this issue, long-duration read locks should also be applied when reading data. The isolation level that solves the non-repeatable read problem is called "Repeatable Read" (RR).
Up to the Repeatable Read level, locking is performed on individual data items. Under this isolation level, if a transaction performs a range query twice, such as executing two select count(x) where x > 5
queries, and another transaction inserts a record where x = 6
between the two queries, the results of the two queries will not be the same. This phenomenon is known as a "phantom read." To resolve phantom read issues, long-duration predicate read locks are needed for range queries. The isolation level that addresses phantom reads is called "Serializability."
In the Serializability isolation level, both read and write locks must be held until the transaction concludes. This locking approach is known as "Two-Phase Locking" (2PL).
In 2PL, there are two phases:
In the first phase (during transaction execution), only locking is allowed.
In the second phase (during transaction commit or rollback), only lock release is permitted.
Indeed this is strong strict two-phase locking(SS2PL). If you are interested, please search for more details about the differences between SS2PL and 2PL.
Generalize it
From a more abstract perspective, each transaction must handle a subset of data objects during its execution. The actions may involve reading and writing. If two concurrent transactions touch non-overlapping data subsets, there is no need for any special handling, and they can safely execute concurrently.
When there is an overlap in the data subsets, it causes dependencies between transactions. You can abstract transactions as points, and the dependencies between them can be represented as directed edges, thus forming a directed acyclic graph (DAG).
The ideal interface that transactions provide to the outside world is that all transactions can be condensed into a single point on the timeline, making them appear instantaneously completed, which is the "A" in ACID (atomicity). This allows all transactions to be naturally sorted in the timeline, thus achieving serializability.
However, in practice, transactions require some time to execute, so we should treat a transaction as a time interval. Overlapping time intervals introduce various concurrency and isolation (or visibility) issues. So, how do we ensure that physically concurrent transactions appear as if they execute logically in a sequential and atomic manner? The solution lies in maintaining specific invariants before and after transaction execution.
These invariants are the "C" in ACID, which stands for consistency. From an application layer perspective, you can also refer to it as causality. This means that dependencies between transactions with data overlap must be correctly managed. For instance, if transaction T1
relies on reading specific content to make a decision and then modifies it, concurrent transaction T2
must not alter that content; otherwise, the decision made by T1
would be incorrect. You can refer to Chapter 7 of the book "Designing Data-Intensive Applications" (DDIA) regarding the problem of doctors' on-call schedules.
We typically use two approaches to maintain this consistency:
Pessimistic approach: Locking, which ensures that the same data subset cannot be accessed by multiple transactions simultaneously.
Optimistic approach: Multi-Version Concurrency Control (MVCC), where each data object stores multiple versions, and each version is immutable. Modifying an object results in the creation of a new version. Each transaction can read data and write data without caring about others. The conflicts are detected upon commit, and resolved through retries.
This explanation simplifies many details, so if you're interested, you can explore further by watching my video.
The Last Thing
The four isolation levels discussed above do not encompass another common isolation level known as "Snapshot Isolation." SI introduces the optimistic approach mentioned earlier, which is MVCC (Multi-Version Concurrency Control). Since they belong to different implementation philosophies, Snapshot Isolation and Repeatable Read exist on a partial order spectrum of isolation level strengths. It's not accurate to claim that one is inherently stronger than the other, and there's an opportunity for a more in-depth exploration of this topic if I have time.
Subscribe to my newsletter
Read articles from muniao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
muniao
muniao
A software engineer focus on large-scale data system, including data processing, storage and databases.