Navigating the Dual Write Problem: Implementing the Outbox Pattern for Data Consistency in Distributed Systems

Siddhartha SSiddhartha S
4 min read

Introduction

Maintaining data consistency in distributed systems is a challenge due to the possibility of service or component failures at any time.

In a previous article, I provided a detailed explanation of CQRS with Event Sourcing, discussing these two patterns in principle and elaborating on their implementation using .NET. A caveat that was left unanswered in that article was the possibility of failure when publishing an event after writing to the event store.

In this article, we will explore the classic distributed system problem known as the Dual Write Problem and a possible solution to it.

Dual Write Problem

Most of us are familiar with database transactions, where all Data Manipulation Language (DML) statements executed within the scope of a transaction are either fully committed or entirely reverted.

Now, imagine achieving a similar result across two entirely different systems. Consider the common scenario in which a microservice needs to update its database and then publish an event to an event bus (such as Kafka or an MQ). Refer to the diagram below:

Considering what can go wrong:

  1. It’s a positive outcome if both transactions are successful.

  2. However, if both transactions fail, it still presents a happy situation from a data consistency perspective.

  3. If the database transaction succeeds but an issue arises before event publishing, leaving the event unpublished, this leads to an inconsistent state in our system.

  4. Reversing the order of these operations alters the situation, but the underlying issue remains.

If the event fails to be placed onto the event bus due to its unavailability or a network issue, the event will be lost, resulting in inconsistent data.

Traditional databases address transaction problems through transaction logs, checkpointing, etc. However, in this case, we are dealing with two distinct systems.

The solution

The solution is to execute these transactions in a series. The state of the data should progress from its original state to an intermediate state and finally to the desired distributed state.

Here is a diagram to help you visualize the various scenarios:

  1. The happy case occurs when the data flows to the database and then to the event bus.

  2. In another happy scenario, data does not flow to the database (due to the database being down or some network issue), and the event bus also does not receive any events. The data state remains consistent.

  3. The case where data writes are successful to the database but fail when attempting to publish to the event bus (due to network issues or an unavailable event bus) creates an intermediate state. Here, the intermediate data (I.D.) is recorded in the database. Whenever the event bus becomes available, we can replay the intermediate data, ensuring that the system reaches a consistent state.

Outbox Pattern

The Outbox Pattern (also known as the Transactional Outbox Pattern) is an implementation of the solution described above.

💡
It operates similarly to an email outbox. When an email is sent, it is first stored in the outbox, and once the email is successfully sent, the message is removed and placed in the sent items. Hence the name!

Refer to the diagram below:

The service writes data to the database and also records the event in an outbox table or collection. A worker process continuously monitors the outbox. As soon as an entry is made, it attempts to push it to the event bus and subsequently removes that entry from the outbox.

If the worker cannot place an entry onto the event bus due to issues like downtime or network problems, the event item will remain in the outbox and will be cleared once the worker successfully pushes it to the event bus.

A few points worth considering are:

  • The worker process may go down after publishing to the event bus but before removing the event item from the outbox. Therefore, any consuming service (not shown in the diagram) must be idempotent, as they may receive the same event multiple times.

  • In some implementations, the task of the worker process is handled by the service itself.

Conclusion

In this article, we examined the Dual Write Problem in distributed systems and explored a corresponding solution. We determined that the Transactional Outbox Pattern is a viable solution for the Dual Write Problem, guaranteeing at least once delivery to the event bus, necessitating that consumer services be idempotent. This pattern enhances the consistency of our system's data.

0
Subscribe to my newsletter

Read articles from Siddhartha S directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Siddhartha S
Siddhartha S

With over 18 years of experience in IT, I specialize in designing and building powerful, scalable solutions using a wide range of technologies like JavaScript, .NET, C#, React, Next.js, Golang, AWS, Networking, Databases, DevOps, Kubernetes, and Docker. My career has taken me through various industries, including Manufacturing and Media, but for the last 10 years, I’ve focused on delivering cutting-edge solutions in the Finance sector. As an application architect, I combine cloud expertise with a deep understanding of systems to create solutions that are not only built for today but prepared for tomorrow. My diverse technical background allows me to connect development and infrastructure seamlessly, ensuring businesses can innovate and scale effectively. I’m passionate about creating architectures that are secure, resilient, and efficient—solutions that help businesses turn ideas into reality while staying future-ready.