Unify Your Data Streams: The Real-Time Path to BigQuery with RisingWave

Heng MaHeng Ma
4 min read

For organizations serious about analytics, Google BigQuery is the central nervous system. The challenge is feeding it with fresh, real-time data from a growing number of diverse sources. Your operational data might live in PostgreSQL or MySQL, real-time events could be flowing through Kafka or Pulsar, and notifications from SaaS platforms or internal services might arrive via webhooks.

Stitching these systems together often results in a complex web of disparate pipelines. You might use the native BigQuery Transfer Service for one database, a custom script for Kafka, and yet another tool for webhooks. This creates a brittle, hard-to-manage data architecture that slows down innovation.

What if you could replace that complexity with a single, unified platform?

This post introduces a modern approach using RisingWave, an event streaming platform, as a universal data hub. We'll show how you can use it to stream data from virtually any source into BigQuery in real-time, all through a consistent, SQL-based workflow. We will use PostgreSQL as our primary example, but the principles apply across the board.

The Common Method for a Single Source: BigQuery Data Transfer Service

For a source like PostgreSQL, a popular and well-supported method is Google's own BigQuery Data Transfer Service. This managed service automates batch data loads, taking an initial snapshot and then performing periodic updates.

While excellent for automating traditional ETL, it is fundamentally a batch process. According to its documentation, the minimum frequency for a transfer is 15 minutes. Furthermore, the service's performance is tied to the connection capacity of your source database, meaning frequent, large data pulls could add load to your operational server. For many non-urgent use cases, this is a suitable solution. But for a complete, real-time picture, you need more.

The Modern Approach: A Unified, 3-Step Real-Time Pipeline

RisingWave offers a modern alternative that replaces pipeline complexity with a simple, SQL-native workflow. It acts as a central hub, using its extensive library of connectors to ingest data from all your sources and sink it into BigQuery.

While the example below uses PostgreSQL, imagine swapping the connector for MySQL, Kafka, Pulsar, or a generic webhook to achieve the same result.

Step 1: Ingest from Any Source

First, you tell RisingWave where to get the data with a CREATE SOURCE statement. This command is where you define the connection to your external system.

For our PostgreSQL example, you would capture changes from an orders table like this:

CREATE SOURCE orders_stream ( ... )
WITH (
    connector = 'postgres-cdc',
    hostname = 'your_postgres_host', ...
);

To ingest from a Kafka topic or a webhook instead, you would simply change the connector details:

- Example for Kafka
CREATE SOURCE events_stream ( ... )
WITH ( connector = 'kafka', topic = 'your_kafka_topic', ...
);

Step 2: Process and Transform All Streams Centrally

With your data streams flowing into RisingWave, you can now process them in real-time using standard SQL, regardless of their origin. With a CREATE MATERIALIZED VIEW, you can join a stream of PostgreSQL orders with a Kafka stream of user clicks, filter for high-value events, and aggregate the results.

CREATE MATERIALIZED VIEW high_value_orders_mv AS
SELECT
    order_id,
    customer_id,
    order_total
FROM
    orders_stream
WHERE
    order_total > 1000;

This single materialized view provides a continuously updated, analytics-ready view of your most important data.

Step 3: Sink Unified Data into BigQuery

Finally, you direct the final, processed stream to BigQuery. A CREATE SINK statement sends the data from your materialized view into your target table.

CREATE SINK high_value_orders_bq_sink
FROM high_value_orders_mv
WITH (
    connector = 'bigquery',
    type = 'append-only',
    gcp.project = 'your-gcp-project-id', ...
);

With these steps, you have a single, coherent pipeline for processing and delivering data to BigQuery.

Why Choose This Unified Approach?

Adopting RisingWave as a central hub for BigQuery isn't just about speed; it's about architectural simplicity and scalability.

  • Unified Hub for All Real-Time Data: Instead of managing separate pipelines for each source—from databases like MySQL to message queues like Kafka and generic webhooks—you get one platform and one methodology (SQL) to rule them all. This dramatically simplifies development, monitoring, and maintenance.

  • From Complex Code to Simple SQL: The entire logic is defined with familiar SQL commands, making real-time data processing accessible to any developer or analyst.

  • Truly Real-Time, Cross-Source Insights: Join data from your production database with events from your message queue on the fly. Your insights in BigQuery are not only instant but also more holistic.

Get Started with a Unified Streaming Platform

Stop building pipelines one by one. By leveraging RisingWave, you can create a single, robust, and low-latency data hub that feeds your BigQuery warehouse with real-time data from every corner of your organization.

Ready to simplify your data architecture?

  • Check out our official documentation on our supported sources and the BigQuery sink.

  • Try RisingWave Today:

  • Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.

  • Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.

0
Subscribe to my newsletter

Read articles from Heng Ma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Heng Ma
Heng Ma