Hybrid Store Design Pattern

Felix GVFelix GV
1 min read

The Hybrid Store is a design pattern for merging batch and real-time data sources into a single dataset. It is an abstraction composed of several versioned datasets, along with a current version pointer (or symlink) serving one of them.

The batch data source is a job that produces new versions of the dataset. The real-time data source, on the other hand, appends data to a real-time buffer, and this data eventually gets written to all versions of the dataset.

When the batch data source runs and produces a new version of the dataset, the Hybrid Store design pattern then does the following:

  • Create a new version N+1 of the dataset (called a future version) and load the batch data into it.

  • Replay recent data from the real-time buffer onto the future version N+1.

  • After the buffer replay is caught up, switch the current version pointer from N to N+1. Version N can be kept as a backup. There is no future version anymore (until the next batch job).

Hybrid Stores are a way to implement Lambda Architectures, but by merging at write-time, rather than read-time. They can also support Kappa Architectures.

Venice is an open source database implementing the Hybrid Store design pattern, and other proprietary systems do so as well.

0
Subscribe to my newsletter

Read articles from Felix GV directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Felix GV
Felix GV

Co-author of Venice