Hybrid Store Design Pattern


The Hybrid Store is a design pattern for merging batch and real-time data sources into a single dataset. It is an abstraction composed of several versioned datasets, along with a current version pointer (or symlink) serving one of them.
The batch data source is a job that produces new versions of the dataset. The real-time data source, on the other hand, appends data to a real-time buffer, and this data eventually gets written to all versions of the dataset.
When the batch data source runs and produces a new version of the dataset, the Hybrid Store design pattern then does the following:
- Create a new version N+1 of the dataset (called a future version) and load the batch data into it.
- Replay recent data from the real-time buffer onto the future version N+1.
- After the buffer replay is caught up, switch the current version pointer from N to N+1. Version N can be kept as a backup. There is no future version anymore (until the next batch job).
Hybrid Stores are a way to implement Lambda Architectures, but by merging at write-time, rather than read-time. They can also support Kappa Architectures.
Venice is an open source database implementing the Hybrid Store design pattern, and other proprietary systems do so as well.
Subscribe to my newsletter
Read articles from Felix GV directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Felix GV
Felix GV
Co-author of Venice