Microservice Communication - 2PC
Concept
Two-phase commit (2PC) is a protocol that ensures atomicity across multiple applications that use different database connections. This protocol (a specialized type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used Managed by a transaction manager, 2PC transparently handles commitment or rollback monitoring without user intervention.
Let us understand the problem with a real-world problem
Let's say you are the head of a chain of medical stores, you have to ensure there is a proper balance of painkillers available in all of your stores. You run a query to the stores and based on their reply you order them to rebalance the stock. A singular transaction, denoted as T, carries out the task across various components, such as Tn at the nth store. The manager's location, represented by store S0, corresponds to T0. The activities undertaken by transaction T unfold in the following sequence:
a) Component of transaction( T ) T0 is created at the head-site(head-office).
b) T0 sends messages to all the stores to order them to create components Ti.
c) Every Ti executes a query at the store “i” to discover the quantity of available painkillers inventory and reports this number to To.
d) Each store receives instruction and update the inventory level and made shipment to other stores where require.
But there are a few problems...
The guarantee of atomicity can be compromised if multiple instructions are sent to a store (Sn), potentially resulting in an inconsistent database state. To preserve atomicity, Transaction T must either universally commit across all sites or uniformly abort across all sites.
In the event of a system crash at store Tn or failure to deliver instructions from T0 due to network issues or other causes, uncertainty arises concerning the fate of the distributed transaction. The question of whether the transaction will proceed with committing or aborting, as well as the potential for recovery, becomes a consideration.
2PC to rescue
In a "typical scenario" of an undisrupted execution of a distributed transaction (when no failures occur, which is the usual case), the protocol involves two main stages:
Commit-Request Phase (or Voting Phase): The coordinator process initiates this phase, aiming to prepare all involved processes (referred to as participants, cohorts, or workers) for the required actions: committing or aborting the transaction. Participants cast their votes, indicating "Yes" to commit if their local execution went smoothly, or "No" to abort if any local issues arose.
Commit Phase: Following the participants' votes, the coordinator determines whether to proceed with committing (only if all votes are "Yes") or to abort the transaction (if any "No" votes are present). The coordinator then communicates the decision to all participants. Subsequently, participants take the necessary steps (committing or aborting) within their local resources (like databases) and address their roles in the transaction's overall outcome (if applicable).
Let's see the process with a simple code snippet written in Python
# COORDINATOR SERVICE
async def beginTransaction() -> bool:
async with httpx.AsyncClient(timeout=10000) as client:
payment_response = await client.post('http://localhost:5002/prepare/payment')
prepare_order_response = await client.post('http://localhost:5001/prepare/order')
return True
async def abortTransaction():
async with httpx.AsyncClient(timeout=10000) as client:
# Make asynchronous requests
client.post('http://localhost:5001/abort/order')
client.post('http://localhost:5002/abort/payment')
return True
@app.post('/place-order')
async def placeOrder():
print("Transaction started")
try:
await beginTransaction()
except httpx.ConnectTimeout:
await abortTransaction()
return False
except httpx.ConnectError:
await abortTransaction()
return False
return "Order placed successfully"
Here is how a simple order service will look like
# ORDER SERVICE
@app.route('/prepare/order', methods=['POST'])
async def prepare():
transaction_id = random.randint(1, 1000000)
cur = conn.cursor()
cur.execute("BEGIN;")
print("Prepare Order started - Transaction ID: " + str(transaction_id))
cur.execute("INSERT INTO orders (order_id, name) VALUES (%s, %s)", (transaction_id, 'on'))
time.sleep(3)
return 'OK'
@app.route('/commit/order', methods=['POST'])
async def commit():
cur = conn.cursor()
cur.execute("COMMIT;")
return 'OK'
@app.route('/abort/order', methods=['POST'])
async def abort():
print("Aborting order")
cur = conn.cursor()
cur.execute("ROLLBACK;")
return 'OK'
And here is the basic payment service code
# PAYMENT SERVICE
@app.route('/prepare/payment', methods=['POST'])
async def prepare():
cur = conn.cursor()
transaction_id = random.randint(1, 1000000)
cur.execute("BEGIN;")
print("Prepare Payment started - Transaction ID: " + str(transaction_id))
cur.execute("INSERT INTO payments (id, amount) VALUES (%s, %s)", (transaction_id, 145))
time.sleep(3)
return 'OK'
@app.route('/commit/payment', methods=['POST'])
async def commit():
cur = conn.cursor()
cur.execute("COMMIT;")
return 'OK'
@app.route('/abort/payment', methods=['POST'])
async def abort():
print("Aborting payment")
cur = conn.cursor()
cur.execute("ROLLBACK;")
return 'OK'
Please note: This is the simplest implementation of a two-phase commit system by using the database's transaction control protocol.
The overall flow will look like this
Coordinator Participant
QUERY TO COMMIT
-------------------------------->
VOTE YES/NO prepare*/abort*
<-------------------------------
commit*/abort* COMMIT/ROLLBACK
-------------------------------->
ACKNOWLEDGEMENT commit*/abort*
<--------------------------------
end
Drawbacks of 2-Phase Commit
We will now explore the disadvantages of the 2-Phase Commit. The following are the major drawbacks of using 2-PC in distributed systems:-
Latency: As we saw the Transaction Coordinator waits for responses from all the participant servers. Only then it carries on with the second phase of the commit. This increases the latency and the client may experience slowness in execution. Hence, 2-PC is not a good choice for performance-critical applications.
Transaction Coordinator: The Transaction Coordinator becomes a single point of failure at times. The Transaction Coordinator may go down before sending a commit message to all the participants. In such cases, all the transactions running on the participants will go in a blocked state. They would commit only once the coordinator comes up & sends a commit signal.
Participant dependency: A slow participant affects the performance of other participants. Total transaction time is proportional to the time taken by the slowest server. If the transaction fails on a single server, it has to be rolled back on all other servers. This may lead to wastage of resources.
All these problems of the two-phase commit protocol can be overcome by SAGA Pattern. Saga pattern typically follows eventual consistency to handle distributed transactions.
We will discuss that in an upcoming article.
Subscribe to my newsletter
Read articles from Subhajit Dutta directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Subhajit Dutta
Subhajit Dutta
Staff software engineer specializing in highly scalable, backend-heavy applications. Passionate about building cloud-native, resilient and fault-tolerant applications.