Handling Partial Failures in Distributed Payment Systems with the Saga Pattern

The Problem: Partial Failures in Payments
Modern payment systems often involve multiple services:
Reserve Funds (Bank API)
Charge Customer (Payment Gateway)
Update Inventory (Inventory Service)
Send Confirmation (Email Service)
Partial Failure Example:
Funds are reserved, but the payment gateway fails.
Inventory is deducted, but the email service is down.
Without a recovery mechanism, this leaves systems in an inconsistent state.
What is the Saga Pattern?
A Saga is a sequence of local transactions where:
Each step has a compensating action to undo it.
Services collaborate via events (choreography) or a central orchestrator (orchestration).
Example Saga for Payments:
[Reserve Funds] → [Charge Customer] → [Update Inventory] → [Send Email]
│ │ │ │
└─[Release Funds] └─[Refund] └─[Restock] └─[N/A]
Implementing the Saga Pattern with Azure Durable Functions
We’ll use the orchestration-based Saga for centralized control and easier debugging.
Step 1: Define the Orchestrator
[FunctionName("PaymentSagaOrchestrator")]
public static async Task RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var paymentRequest = context.GetInput<PaymentRequest>();
try
{
// Step 1: Reserve Funds
var reservationId = await context.CallActivityAsync<Guid>(
"ReserveFundsActivity",
paymentRequest);
// Step 2: Charge Customer
var chargeId = await context.CallActivityAsync<string>(
"ChargeCustomerActivity",
paymentRequest);
// Step 3: Update Inventory
await context.CallActivityAsync(
"UpdateInventoryActivity",
paymentRequest.Items);
// Step 4: Send Confirmation
await context.CallActivityAsync(
"SendConfirmationActivity",
paymentRequest.UserEmail);
return "Payment Completed Successfully";
}
catch (Exception ex)
{
// Compensate for completed steps
var sagaContext = new SagaContext
{
ReservationId = reservationId,
ChargeId = chargeId,
PaymentRequest = paymentRequest
};
await Compensate(context, sagaContext);
return "Payment Failed - Compensated";
}
}
private static async Task Compensate(
IDurableOrchestrationContext context,
SagaContext sagaContext)
{
// Reverse steps in reverse order
if (sagaContext.ChargeId != null)
await context.CallActivityAsync("RefundChargeActivity", sagaContext.ChargeId);
if (sagaContext.ReservationId != Guid.Empty)
await context.CallActivityAsync("ReleaseFundsActivity", sagaContext.ReservationId);
}
Step 2: Implement Compensating Actions
Release Reserved Funds
[FunctionName("ReleaseFundsActivity")]
public static async Task ReleaseFunds(
[ActivityTrigger] Guid reservationId,
ILogger log)
{
await _bankService.ReleaseReservationAsync(reservationId);
log.LogInformation($"Released funds for reservation {reservationId}");
}
Refund Customer Charge
[FunctionName("RefundChargeActivity")]
public static async Task RefundCharge(
[ActivityTrigger] string chargeId,
ILogger log)
{
await _paymentGateway.RefundAsync(chargeId);
log.LogInformation($"Refunded charge {chargeId}");
}
Key Features
1. Compensation Guarantees
Atomicity: Each step is undone if subsequent steps fail.
Order: Compensate in reverse order (e.g., refund before releasing funds).
2. Retry Policies
Add retries to transient steps (e.g., payment gateway timeouts):
var retryOptions = new RetryOptions(
firstRetryInterval: TimeSpan.FromSeconds(2),
maxNumberOfAttempts: 3);
await context.CallActivityWithRetryAsync(
"ChargeCustomerActivity",
retryOptions,
paymentRequest);
3. Idempotency
Use idempotency keys to avoid duplicate charges/refunds.
Track processed requests in Azure SQL or Cosmos DB:
var isDuplicate = await _repository.Exists(request.IdempotencyKey);
if (isDuplicate) throw new DuplicateRequestException();
Best Practices
1. Logging & Monitoring
Log saga state changes to Application Insights.
Use Durable Functions HTTP APIs to query orchestration status.
2. Alerting
Trigger alerts for:
Uncompensated failures.
Prolonged orchestrations.
3. Testing
Chaos Engineering: Use Azure Fault Injection Studio to simulate failures.
Unit Tests: Mock compensating actions.
4. Tooling
Azure Service Bus: For event-driven compensation triggers.
Azure Logic Apps: For human-in-the-loop approvals (e.g., manual refunds).
Real-World Use Case: Travel Booking Platform
Problem: Flight bookings failed after hotel reservations succeeded, leaving customers charged without bookings.
Solution:
Saga Orchestrator:
Reserve Hotel → Reserve Flight → Charge Customer.
Compensate: Cancel Flight → Cancel Hotel → Refund.
Results:
Reduced customer complaints by 90%.
Automated recovery for 95% of partial failures.
Saga vs. Other Patterns
Pattern | Use Case | Pros | Cons |
Saga | Distributed transactions | No distributed locks, scalable | Complex compensation logic |
2PC | ACID transactions (single database) | Strong consistency | Poor scalability, blocking |
Event Sourcing | Audit trails, replay ability | Temporal debugging | High storage costs |
When to Use the Saga Pattern
Multi-Service Transactions: Payments, order fulfillment, travel bookings.
Eventual Consistency: Systems where temporary inconsistency is acceptable.
Long-Running Processes: Transactions spanning minutes/hours.
Conclusion
The Saga Pattern, combined with Azure Durable Functions, provides a robust way to handle partial failures in payment systems. By automating compensation and leveraging Azure’s serverless ecosystem, you can build resilient, self-healing workflows that maintain data consistency without monolithic transactions.
Subscribe to my newsletter
Read articles from Anurag Dakalia directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by