PeerDB Streams - Simple, Native Postgres Change Data Capture

Sai SrirampurSai Srirampur
5 min read

We spent the past 7 months building a solid experience to replicate data from Postgres to Data Warehouses such as Snowflake, BigQuery, ClickHouse and Postgres.

Now, we want to expand and bring a similar experience for Queues. With that spirit, we are excited to announce PeerDB Streams. PeerDB Streams provides a simple and native way to replicate changes as they happen in Postgres to Queues / message brokers such as Kafka, Redpanda, Google PubSub, Azure Event Hubs, and so on. Under the hood, PeerDB Streams uses Postgres logical decoding to enable Postgres Change Data Capture (CDC).

The Problem

We selected Queues as our next target because we've heard from multiple Postgres users that existing CDC tools are complex and have a significant learning curve. Debezium is the most common technology for this use-case. It is proven and has large production usage. However, a common pain point among our users is that Debezium has a significant learning curve and requires institutional knowledge to set up and manage in production. It takes a few months to fully deploy Debezium in production. A few common issues from users include -

  1. Interacting through a command line interface or configuration files, understanding the various options / settings, and learning best practices for running Debezium in production requires a significant learning curve. Debezium UI, released to address usability concerns, is still in an incubating state. Additionally, reading Debezium docs/resources to get started can be overwhelming and not the most approachable.

  2. Supporting data formats (ex: MsgPack) and transformations is not trivial and incurs an additional learning curve. You need to write a Java project, build JAR packages and set up a runtime path on the kafka connect plugin. It isn’t as simple as plugging in a premade template or writing a few lines of code.

  3. Debezium is not as native as Kafka for other types of message brokers and does not offer the same level of configurability. For example, with Event Hubs, it is difficult to define custom partitioning schemes and stream to topics spread across namespaces and subscriptions.

TL;DR We believe that Debezium aims to provide a comprehensive experience for engineers to implement CDC rather than making it dead simple for them. So you can do a lot with Debezium but need to know a lot about it.

PeerDB Streams - Simple, Native Postgres Change Data Capture (CDC)

This is what we want to address with PeerDB. We are building a Simple, yet Comprehensive experience for Postgres Change Data Capture (CDC). The goal is to enable engineers to implement production-grade Postgres CDC with a minimal learning curve, within a few days.

PeerDB’s feature-set isn't at Debezium's level yet, and as PeerDB evolves, we might face similar usability challenges. However, we're putting Simplicity/Usability at the forefront and we believe that we can achieve the above goal. Here is how we are doing it –

Simple Postgres CDC Using PeerDB UI

First and foremost, PeerDB offers a simple UI to set up source and target data sources (such as Postgres and Kafka) by creating PEERs and initiating CDC by creating a MIRROR.

Through the UI, users can monitor the progress of CDC, including throughput (per table) and latency; search through logs; set up alerts to Slack or Email based on replication slot growth; investigate Postgres-specific metrics, including slot size, wait events for replication, and more. The UI also offers advanced features, including tuning MIRRORs, pausing MIRRORs, adding tables to MIRRORs, and more. We have strived to make these features as intuitive as possible for users, for example, by using information toolbars and simple language. Below is a demo showing of PeerDB UI in action. Here a link to the quick start for you to try PeerDB Streams in just a few minutes.

Enhanced CLI Experience: Intuitive SQL Layer for Managing Postgres CDC

Second, for users who prefer a CLI over the UI, we provide a Postgres-compatible SQL layer to initiate and manage CDC. This SQL layer offers the same level of comprehensiveness as the UI and we believe that it is far more intuitive and user-friendly compared to bash scripts and configuration files.

Simple Lua Scripts for Row-Level Transformations

Third, users can perform row-level transformations before streaming CDC changes to Kafka. They can write Lua scripts to execute these transformations. This enables powerful features such as encrypting/masking personally identifiable information (PII), supporting various data formats (JSON, MsgPack, Flatbuffers, Protobuf, etc.), and more. To make it very simple for users, we offer a script editor along with a bunch of useful templates. Additionally, applying a transformation is optional, with the default data format being JSON.

Native Connectors to non-Kafka targets

Fourth, we offer native connectors to non-Kafka targets, including Google Pub/Sub and Azure Event Hubs. Behind the scenes, we utilize the native Go APIs/libraries provided by these services to build our connectors, instead of relying on the less developed Kafka-compatible layer over these queues. We support advanced features specific to these services. For example, with Azure Event Hubs, users can perform CDC to topics distributed across different namespaces and subscriptions.

PeerDB Streams is Postgres Native

Finally, we are laser-focused on Postgres and, as of now, don't support any other databases. This allows us to implement many Postgres-native optimizations. For example, we provide Postgres-native metrics and alerts, including replication slot growth, wait events for logical decoding, number of connections and so on. Features such as parallel snapshotting for 10x faster initial loads and decoding in-flight transactions are in private beta.

Try PeerDB Streams

Checkout this 10-minute quickstart to try PeerDB for Postgres CDC to Kafka.

Separately, you can try PeerDB through one of three offerings: Open Source offering, PeerDB Cloud, our fully managed service, and a self-hosted enterprise offering that includes production-grade Helm charts.

Our vision is to provide the world’s best data-movement experience for Postgres. PeerDB Streams is another step in that direction. We built PeerDB Streams in close design partnership with a few Fintech and IoT customers implementing Postgres CDC for their transactional outbox use cases. The product has been battle-tested at scale and is constantly evolving. We would love to get your feedback on product experience, our thesis and anything else that comes to your mind. It would be super useful for us. Thank you!

3
Subscribe to my newsletter

Read articles from Sai Srirampur directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sai Srirampur
Sai Srirampur