Building a Resilient Site-Aware Synchronization


The Challenge of Multi-Site Data Autonomy
In distributed environments, especially in industrial or regulated contexts, applications are often deployed across multiple geographically separated sites. These sites must remain fully operational even in the event of a network outage lasting several days.
What the heck is the challenge?
Ensuring local application autonomy, while maintaining data consistency and eventual synchronization across all different sites.
A Hybrid, On-Prem, Multi-DB Ecosystem
In our case, we manage several on-premise sites, each with its own local database. Some use PostgreSQL, others Oracle. A cloud-native solution wasn't viable due to strict network constraints, and we needed every site to operate independently without relying on real-time connectivity.
Classic Replication vs Resilient Architecture
I explored traditional database replication methods, including cross-database replication tools like GoldenGate or SymmetricDS. These tools inevitably introduce operational complexity. Moreover, synchronization conflicts must be addressed regardless.
Instead, I chose to design a resilient, event-driven architecture based on asynchronous communication and eventual consistency, with near real-time propagation when neither conflicts nor network failures are present.
Event-Driven, Site-Aware Synchronization
I designed an architecture that combines the Outbox Pattern, a RabbitMQ cluster with 3 nodes, and an event fanout strategy to broadcast the event to all sites. Each site manages its own data and produces events locally.
Key Concepts:
Outbox Pattern: Events are stored in a dedicated table in the local DB in the same local transaction.
RabbitMQ Cluster: Three brokers deployed in a logical grouping with quorum queues to ensure reliability across nodes.
Fanout Mode: Events are broadcast to all consumers, including the originating site.
Deduplication: Each event has a globally unique identifier
id
to support idempotent processing.Historization: A dedicated queue that receives all events to be stored in the reference site's database for auditing and replay purposes.
Architecture Overview
Event Propagation Flow – London as Propagator, Paris as Golden Source
Below is a simple Event Propagation Flow Design involving three sites. In this setup, London acts as the Propagator, while Paris, which also processes events, serves as the Golden Source responsible for archiving all events. This architecture deploys an eventual consistency model and does not include automatic conflict resolution.
In case of a conflict, events are routed to a Dead Letter Queue (DLQ) for manual inspection and processing.
Principal Actors
Local Write: The application (API) writes to the local database and inserts a new event with a unique ID into the outbox table within the same transaction.
Outbox Dispatcher: A background service reads events from the outbox table and publishes them to a RabbitMQ fanout exchange.
Fanout Exchange: Broadcasts the event to all sites.
Event Handling Consumer: Each site checks whether it has already processed the event (using the unique ID). If not, it applies the corresponding change to its local database.
Self-Cleanup: The originating site deletes the event from its outbox table once it has been successfully consumed from its queue, confirming that the event was propagated correctly.
Historization: The event is also sent to the
SITE_PARIS_HISTO
queue and stored in the Paris database for auditing and replay.Retention: A scheduled task deletes historical events older than 90 days (configurable via application settings).
Event Payload Structure
Each event includes a fixed first-level structure:
id: A unique identifier for the event, composed of the site name followed by a UUID (e.g., London_9f8a4c2e-3b1d-4a6b-91d9-df4b2e6e8a7f)
timestamp: The date and time when the event was created.
origin: The originating site name (i.e., the propagating site).
type: A unique event name, such as
User_Added
,Dinner_Reserved
, etc.data: The payload containing the event-specific information.
Key Design Choices
Decentralized event production: Any site can produce events.
Resilience first: The system tolerates network outages thanks to the outbox pattern.
Replayability: Events stored in the Paris archive can be replayed to recover a site.
Scalability: New sites can be added with minimal configuration.
Idempotency: Guaranteed through the use of globally unique event IDs.
Advanced Option – Confirmed Event Processing
To further secure event handling, I introduced a confirmation event handler. Each site sends a confirmation after successfully processing its event. These confirmation events follow the same structure as the original business events:
id: A unique identifier for the confirmation event.
timestamp: The date and time when the event was created.
origin: The name of the consuming site (not the propagator).
type: A prefix like
Confirmation_
followed by the original event type (e.g.,Confirmation_User_Added
).data: Specific details of the confirmation event.
The confirmation event is propagated through a topic exchange to ensure it is delivered both to the original propagator and to the historization site (SITE_PARIS_HISTO
).
This enables:
Reliable replay detection (without reprocessing).
Auditing of which site processed which event and when.
Easier debugging in multi-step processing chains.
This confirmation layer complements idempotency and ensures clarity in multi-site data flow.
Final Thoughts
In distributed multi-site environments, ensuring data autonomy and consistency — especially during network disruptions — is essential. A resilient, event-driven architecture based on the Outbox Pattern, RabbitMQ Cluster, and eventual consistency offers a robust foundation for synchronizing data without sacrificing local control.
With features like decentralized event production, idempotency, scalability, and replayability, this approach enables reliable and maintainable cross-site communication, even in the face of failure.
Try It Yourself
I have created a GitHub repository that includes:
A simple flow diagram that illustrates the architecture
A Docker Compose setup to launch a RabbitMQ cluster with 3 nodes behind an HAProxy
A README file with detailed instructions about the whole setup
Feel free to fork the repository, run the setup locally, and customize it to fit your needs.
Thanks for reading!
If you enjoyed this post, follow me on Twitter or LinkedIn for more. Got feedback or suggestions? Drop a comment below—I’d love to hear from you 👌
Subscribe to my newsletter
Read articles from Elie Nehmé directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Elie Nehmé
Elie Nehmé
Directeur Craft / Devops chez Sopra Steria Next