Distributed Transaction Management – Keeping Data Consistent Across the System


This is Part 3 of the "Distributed DBs: A Clear Guide" series. In this article, we break down how distributed systems manage transactions across multiple nodes while ensuring accuracy, atomicity, and consistency — even when things go wrong.
1. What Is a Transaction?
A transaction is a sequence of operations performed as a single logical unit of work. A good transaction follows the ACID properties:
Atomicity – all or nothing
Consistency – maintains data integrity
Isolation – no interference from other transactions
Durability – once committed, data stays saved
2. The Challenge in Distributed Systems
In centralized databases, transactions are straightforward. But in distributed systems, the data involved in a transaction may span multiple nodes.
Real-world example:
Imagine booking a hotel and a flight in one transaction. The hotel service and the flight service are on different servers. Both must succeed — or both must roll back. That’s a distributed transaction.
3. Distributed Transaction Management Goals
Atomicity across sites
Coordination of commit/abort decisions
Failure recovery
Concurrency control across nodes
4. Concurrency Control in Distributed Databases
Concurrency control ensures multiple transactions don’t interfere with one another.
4.1 Two-Phase Locking (2PL)
A transaction:
Acquires all locks it needs (growing phase)
Releases them only after commit (shrinking phase)
This prevents conflicts but can cause deadlocks.
5. Deadlocks in Distributed Systems
A deadlock occurs when two transactions wait for each other to release resources. In distributed systems, this is harder to detect.
Techniques to detect and resolve deadlocks:
Centralized detection
Timeout-based detection
Hierarchical detection
Phantom deadlocks can happen if messages are delayed, showing a false wait cycle.
6. Two-Phase Commit Protocol (2PC)
6.1 Why 2PC?
To ensure that a transaction either commits on all nodes or none at all.
6.2 The Two Phases:
Phase 1: Prepare
Coordinator asks all participants if they’re ready to commit
Participants write logs and reply YES/NO
Phase 2: Commit or Abort
If all said YES, coordinator tells everyone to COMMIT
If any said NO, coordinator tells all to ABORT
Coordinator
↓
+---------------+
| PREPARE? |
+---------------+
/ | \
Site A Site B Site C
↓ ↓ ↓
YES YES YES
↓
COMMIT!
7. Limitations of 2PC
Blocking: If the coordinator crashes after Phase 1, the system can’t proceed
Synchronous: All nodes must reply — slows things down
7.1 3PC – Three-Phase Commit
Introduces a "pre-commit" step to avoid blocking.
Used less often due to overhead.
8. Failure Handling
8.1 Types of Failures:
Site failure: node goes down
Link failure: network break
Message failure: messages delayed or lost
8.2 Recovery Techniques:
Write-ahead logging (WAL)
Redo and undo logs
Participant state records
9. Global Serializability
Even with transactions across sites, we must ensure the global outcome is serializable — behaves like if transactions ran one after the other.
This is harder in distributed settings but essential for correctness.
10. Visual Summary: 2PC Protocol
Client Coordinator Site A Site B
| | | |
Begin TX | | |
|----------→ Begin --------→ |
| |---------→ Begin ----→|
| | | |
| |<- PREPARE --| |
| |<----------- | |
| | -- COMMIT ->| |
| |------------>| |
| | | |
11. Real-World Analogy: Banking System
Imagine transferring money from your checking to savings account. If the system withdraws from checking but fails before depositing into savings, your money is lost. Distributed transaction management prevents this by committing only if all steps succeed.
12. Wrapping Up
Distributed transaction management is the glue that ensures integrity in a fragmented system. It handles coordination, failure, and concurrency so that developers can focus on building robust applications.
In the next part of this series, we’ll dive into Distributed Query Processing — how to run efficient queries across the distributed setup.
Subscribe to my newsletter
Read articles from Muhammad Sajid Bashir directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Muhammad Sajid Bashir
Muhammad Sajid Bashir
I'm a versatile tech professional working at the intersection of Machine Learning, Data Engineering, and Full Stack Development. With hands-on experience in distributed systems, pipelines, and scalable applications, I translate complex data into real-world impact.