DDIA - Chapter 7 - Transactions - thoughts and notes

Vivek KhatriVivek Khatri
7 min read

Disclaimer: This blog has been partially written by the help of LLM.

The more I read DDIA the more my trust on databases is vanishing. :’)

This chapter on transactions was not that interesting, honestly. Most of it was just how the keywords in ACID are mostly for show and they mean different things in different situations.

A lot of the chapter was just talking about how things need to be atomic. Trade-offs, trade-offs, trade-offs. The next chapter will be mind boggling because this chapter focussed on transactions on a single node application. Imagine if things are distributed, god save us then.

Transactions

Put multiple commands in a single block and if any command fails the whole block execution gets aborted. Easy, right? Right. We will see that things are not what they seem.

ACID

  • Atomicity: A transaction must either complete fully or not at all. If any part fails, the entire transaction is rolled back, leaving the database unchanged.
  • Consistency: Ensures that the database moves from one valid state to another, but this is primarily the application's responsibility, not the database's. The application uses atomicity and isolation to maintain consistency.

  • Isolation: Transactions are executed independently without interference, appearing as if they were run sequentially, even if executed in parallel. Full isolation is rare due to performance costs.

  • Durability: Once a transaction is committed, the changes are permanent, even in the event of system failures. This is often ensured through mechanisms like write-ahead logging.

Single-Object Writes: Atomicity and isolation for single-object writes are simple, using locks for isolation and logs for crash recovery. Operations like INCREMENT are not full transactions but offer ACID-like guarantees.

Multi-Object Transactions: Multi-object updates (e.g., foreign keys, denormalized data, secondary indexes) often require transactions to ensure consistency. Without them, error handling becomes complex.

Error Handling: Transactions allow abort and retry to prevent incomplete operations. However, systems like leaderless replication use a "best-effort" model, leaving error recovery to the application.

Challenges:

  • Network failures may mislead users about transaction status.

  • Retrying during overload can worsen issues.

  • Retries are useless for permanent errors.

  • Side effects may still occur after an aborted transaction.

Weak Isolation Levels

Concurrency bugs are hard to detect and reproduce. Databases use transaction isolation to address them, but full serializability is costly. To balance performance, weaker isolation levels are often used, though they come with trade-offs and potential subtle bugs. Understanding these trade-offs is essential.

Read Committed Isolation:

  • No dirty reads: A transaction only sees data committed by other transactions, preventing confusion from partially updated or aborted transactions.

  • No dirty writes: Prevents a transaction from overwriting uncommitted data by delaying the second write until the first transaction is complete.

Implementation:

  • No dirty writes: Achieved using row-level locks that hold until the transaction is committed or aborted.

  • No dirty reads: The database provides the old value for reads while a transaction is in progress and switches to the new value after the transaction commits.

Snapshot Isolation and Repeatable Read: Even with these isolation levels, concurrency bugs like nonrepeatable reads (or read skew) can occur, where reading data during concurrent commits results in inconsistent data.

Situations where inconsistencies are critical:

  • Backups: If writes happen during a backup, restoring from it may cause permanent inconsistencies.

  • Analytics and integrity checks: Queries could return nonsensical results if they access data at different points in time.

Snapshot Isolation: Provides a consistent view of the database for each transaction by reading from a snapshot, preventing nonrepeatable reads. It uses write locks to prevent dirty writes and relies on multi-version concurrency control (MVCC) to maintain multiple versions of data.

  • Read Committed: Uses a separate snapshot for each query.

  • Snapshot Isolation: Uses the same snapshot for the entire transaction.

In some systems, snapshot isolation is called serialisable (Oracle) or repeatable read (PostgreSQL, MySQL).

Preventing Lost Updates: Lost updates occur when two concurrent transactions modify the same data, leading to one update being overwritten. This can happen during operations like incrementing values. Strategies to prevent lost updates include:

  • Atomic write operations: Some databases support atomic operations (like incrementing a counter) that happen without a read-modify-write cycle, using exclusive locks.

  • Explicit locking: Transactions explicitly lock the data, preventing access until the first transaction completes its update.

  • Automatic detection: Allows transactions to run in parallel, but the system detects lost updates and forces a retry.

  • Compare-and-set: Ensures data is modified only if it hasn’t changed since it was last read, restarting the cycle if changes are detected.

  • Conflict resolution in replication: In replicated databases, conflicts are resolved using application logic or special data structures. Commutative atomic operations are helpful in these cases.

  • Last-Write-Wins (LWW): Common in many databases, but it does not prevent lost updates, as the last update overwrites previous ones.

Write Skews and Phantoms:

Write skews occur when two concurrent transactions read one set of data and update another based on that, leading to inconsistent states. Unlike lost updates, write skews happen on different objects, making them harder to detect. For example, two doctors going off-call based on their view of the on-call list could cause scheduling issues.

Phantoms occur when a transaction's write affects the result of a search query in another transaction (e.g., a doctor going off-call changes the count of available doctors). Solving this requires locking rows returned by the query, but this doesn't work for rows that don’t exist (e.g., when no doctors are on-call).

Solution:

  • Materialising conflicts: Create rows for potential scenarios (e.g., pre-create empty calendar slots) and lock them. However, this approach is complex and impacts the data model, so it's typically a last resort.

Serializability

The serializable isolation level ensures transactions produce results as if they were executed one after another, preventing race conditions even when run in parallel.

Actual Serial Execution:

  • Executing transactions serially is re-emerging due to cheaper, higher-capacity RAM, which reduces disk I/O operations.

  • Online Transaction Processing (OLTP) transactions are short and require less data, making serial execution viable.

Stored Procedures:

  • Applications often require multi-step transactions that involve back-and-forth communication with the database.

  • For example, in a doctor's on-call scenario, conditional logic can be moved into a stored procedure to reduce network overhead, allowing faster execution without blocking other transactions.

  • Pros and Cons:

    • Cons:

      • Each database vendor has its own procedural language, often outdated.

      • Limited libraries and tools lead to a poor developer experience.

      • Poorly written procedures can degrade overall system performance.

Partitioning:

  • Single-threaded transactions can be optimized by partitioning data and assigning separate CPUs to each partition, allowing parallel transaction processing.

  • While transactions across multiple partitions require locks, effective partitioning strategies can minimise performance issues.

Two-Phase Locking (2PL): In 2PL, writers block both other writers and readers, while snapshot isolation allows readers and writers to operate without blocking each other.

Implementation of Two-Phase Locking:

  • Readers acquire shared mode locks, while writers obtain exclusive mode locks.

  • A shared lock prevents writers from acquiring an exclusive lock.

  • Locks are released when a transaction commits.

  • The database detects deadlocks and the application retries any aborted transactions.

Performance of Two-Phase Locking:

  • There is overhead associated with acquiring and releasing locks.

  • Concurrency is reduced, leading to unstable latencies, especially at high percentiles.

  • Aborted transactions waste resources.

  • Deadlocks occur more frequently in 2PL compared to lock-based read committed methods.

Serialisable Snapshot Isolation (SSI)

SSI aims to achieve full serialisability with minimal performance overhead compared to snapshot isolation. It is used in single-node systems (like PostgreSQL 9.1) and distributed environments, potentially becoming the new default.

Pessimistic vs. Optimistic Concurrency Control:

  • 2PL: Pessimistic, as it locks resources upfront.

  • SSI: Optimistic, where the database checks for isolation violations only upon transaction commit.

Performance Considerations:

  • Optimistic techniques perform better when there's spare capacity and low contention between transactions.

  • SSI builds on snapshot isolation by adding an algorithm to detect serialization conflicts among writes and determine which transactions to abort.

Detecting Stale MVCC Reads:

  • The database tracks when transactions ignore writes based on MVCC visibility rules.

  • It checks if any ignored writes have been committed and aborts if they have.

  • Read-only transactions do not cause write skew and may wait until committing.

Detecting Writes Affecting Prior Reads:

  • The database maintains ephemeral information about transactions reading data at the index or table level.

  • When a transaction writes, it notifies existing readers of the data.

  • It checks if other writes commit when the reader commits.

Performance of Serialisable Snapshot Isolation:

  • The rate of aborts significantly impacts SSI's overall performance.

  • SSI is best suited for short read-write transactions and is less sensitive to slow transactions compared to two-phase locking or serial execution.

If you reached here then thanks!

0
Subscribe to my newsletter

Read articles from Vivek Khatri directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vivek Khatri
Vivek Khatri

I am still deciding what should I write here.