Saga Pattern using AWS Step Functions
When working with distributed systems in a microservice architecture, we must always architect our services with failure in mind. When the transaction happens between a microservice, we have to handle the failures. The Saga pattern can be used in a failure of distributed transactions. We will look into implementing the Saga pattern using AWS step functions.
Introduction
The Saga pattern is a design pattern for managing distributed transactions across multiple microservices or components in a distributed system. Traditional two-phase commit protocols can become complex and brittle in distributed environments, where failures are common, and coordinating transactions across multiple services can lead to increased latency and potential for deadlock.
The Saga pattern addresses these challenges by breaking down a long-lived transaction into a series of smaller, independent transactions, known as saga steps. Each saga step represents a unit of work that can be completed or compensated independently of other steps. If a failure occurs during the execution of a saga, compensating transactions are used to undo the work performed by previously completed steps, ensuring eventual consistency.
Implementing the Saga Pattern with AWS Step Functions
In AWS Step Functions, you can model a saga as a state machine where each state represents a saga step. The state machine transitions between states based on the success or failure of each step, executing compensating transactions as needed to maintain consistency.
To implement the Saga pattern with AWS Step Functions:
Define Saga Steps: Identify the individual steps of the saga and map them to state machine states.
Handle Success and Failure: Define transitions between states based on the success or failure of each step.
Implement Compensating Transactions: For each saga step, define compensating transactions to undo the effects of completed steps in case of failure.
Manage State and Data: Use input and output data to track the state of the saga and pass information between steps.
Design Considerations
When designing sagas with AWS Step Functions, consider the following best practices:
Error Handling: Handle errors gracefully at each step of the saga and implement retry logic where appropriate.
Timeouts: Set timeouts for each step to prevent long-running tasks from blocking the execution of the saga.
Idempotency: Ensure that each step of the saga is idempotent to handle retries and duplicate requests.
Data Consistency: Use consistent data models and transactional updates to maintain data consistency across saga steps.
Walkthrough Example: Implementing a Saga Pattern for Order Processing
In this example, we'll implement a saga pattern to manage the order processing workflow in an e-commerce system. The order processing workflow consists of three main steps:
Place Order: Initiates the order processing and reserves inventory.
Charge Payment: Charges the customer's payment method.
Fulfill Order: Ships the ordered items to the customer.
If any step fails, we'll need to compensate by undoing the work of the previously completed steps.
Step 1: Define the State Machine
First, let's define the state machine using AWS Step Functions' JSON-based state language. Each state represents a step in the order processing saga.
{
"Comment": "Order Processing Saga",
"StartAt": "PlaceOrder",
"States": {
"PlaceOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:PlaceOrderFunction",
"Next": "ChargePayment",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "CompensatePlaceOrder"
}]
},
"ChargePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:ChargePaymentFunction",
"Next": "FulfillOrder",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "CompensateChargePayment"
}]
},
"FulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:FulfillOrderFunction",
"End": true,
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "CompensateFulfillOrder"
}]
},
"CompensatePlaceOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:CompensatePlaceOrderFunction",
"End": true
},
"CompensateChargePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:CompensateChargePaymentFunction",
"End": true
},
"CompensateFulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:123456789012:function:CompensateFulfillOrderFunction",
"End": true
}
}
}
Step 2: Implement Lambda Functions
Next, implement the Lambda functions for each step of the saga:
- PlaceOrderFunction: Initiates the order processing and reserves inventory.
import boto3
def place_order(order_details):
# Dummy logic to place order and reserve inventory
order_id = '123456'
return order_id
def lambda_handler(event, context):
order_details = event['order_details']
order_id = place_order(order_details)
return {
'orderId': order_id
}
- ChargePaymentFunction: Charges the customer's payment method.
import boto3
def charge_payment(order_id, payment_details):
# Dummy logic to charge payment
payment_id = 'PAY-789'
return payment_id
def lambda_handler(event, context):
order_id = event['orderId']
payment_details = event['payment_details']
payment_id = charge_payment(order_id, payment_details)
return {
'paymentId': payment_id
}
- FulfillOrderFunction: Ships the ordered items to the customer.
import boto3
def fulfill_order(order_id):
# Dummy logic to fulfill order and ship items
return True
def lambda_handler(event, context):
order_id = event['orderId']
success = fulfill_order(order_id)
return {
'success': success
}
- CompensatePlaceOrderFunction: Compensates for the PlaceOrder step (e.g., releases reserved inventory).
import boto3
def compensate_place_order(order_id):
# Dummy logic to release reserved inventory
return True
def lambda_handler(event, context):
order_id = event['orderId']
success = compensate_place_order(order_id)
return {
'success': success
}
- CompensateChargePaymentFunction: Compensates for the ChargePayment step (e.g., refunds the charged amount).
import boto3
def compensate_charge_payment(payment_id):
# Dummy logic to refund charged amount
return True
def lambda_handler(event, context):
payment_id = event['paymentId']
success = compensate_charge_payment(payment_id)
return {
'success': success
}
- CompensateFulfillOrderFunction: Compensates for the FulfillOrder step (e.g., cancels the order shipment).
import boto3
def compensate_fulfill_order(order_id):
# Dummy logic to cancel order shipment
return True
def lambda_handler(event, context):
order_id = event['orderId']
success = compensate_fulfill_order(order_id)
return {
'success': success
}
Step 3: Test the Saga
Once the state machine and Lambda functions are implemented, test the saga by simulating different scenarios:
- Scenario 1 (Success): All steps are completed successfully.
Input
{
"order_details": {
"item_id": "123",
"quantity": 2,
"customer_id": "456"
},
"payment_details": {
"amount": 100,
"payment_method": "credit_card"
}
}
Output
{
"orderId": "123456",
"paymentId": "PAY-789",
"success": true
}
- Scenario 2 (Partial Failure): PlaceOrder and ChargePayment are completed successfully, but FulfillOrder fails. Ensure that compensating transactions are triggered to undo the completed steps.
Input
{
"order_details": {
"item_id": "789",
"quantity": 1,
"customer_id": "123"
},
"payment_details": {
"amount": 50,
"payment_method": "paypal"
}
}
Output
{
"success": false,
"compensatePlaceOrder": true,
"compensateChargePayment": true
}
- Scenario 3 (Complete Failure): PlaceOrder fails. Ensure that compensating transactions are triggered to undo any partially completed steps.
Input
{
"order_details": {
"item_id": "789",
"quantity": 3,
"customer_id": "123"
},
"payment_details": {
"amount": 150,
"payment_method": "credit_card"
}
}
Output
{
"success": false,
"compensateChargePayment": true
}
Conclusion
In conclusion, implementing the Saga pattern using AWS Step Functions provides a powerful mechanism for managing distributed transactions in serverless applications. By breaking down long-lived transactions into smaller, independent steps and orchestrating them with Step Functions, you can achieve eventual consistency and fault tolerance in your distributed systems. Experiment with different saga designs and workflows to find the best approach for your application needs.
Subscribe to my newsletter
Read articles from Rahul Lokurte directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Rahul Lokurte
Rahul Lokurte
I am a Lead Engineer from India. Love to blog about serverless and help teams design and develop serverless architecture. An AWS cloud practitioner. Blogs about AWS Services utilising AWS CDK, CloudFormation, Terraform.