Build a CDC Pipeline from MySQL to ClickHouse with BladePipe
Are you troubled by moving data to ClickHouse? In this blog, we will learn how to move data from relational databases to ClickHouse.
Before diving into the details, here we introduce a powerful tool called BladePipe, which enables you to replicate data in an easy way.
About BladePipe
BladePipe is a professional Change Data Capture(CDC) integration platform, simplifying your data movement between diverse data sources, including databases, message queues, real-time data warehouses, etc.
By using the technique of CDC, BladePipe can track, capture and deliver data changes automatically and accurately in seconds, greatly improving the efficiency of data integration. It provides sound solutions for use cases requiring real-time data replication, fueling data-driven decision-making and business agility.
Highlights
The key features of a MySQL-ClickHouse data pipeline in BladePipe include:
Add
_version
and_sign
fields to ensure accurate merging in ClickHouse.All DML operations are written as INSERT statements, ensuring good synchronization performance.
Support for DDL synchronization.
Schema Migration
When performing schema migration with ClickHouse as the target database, the default table engine selected is ReplacingMergeTree. If replicas are involved, ReplicatedReplacingMergeTree is automatically chosen.
The sort key for ClickHouse tables defaults to the primary key fields of the source table. If the source table has no primary key, tuple() is used as the sort key.
Additional fields _version
and _sign
are added as merge fields. During synchronization, BladePipe automatically fills in these fields based on the DML statements to ensure data consistency between the source and target.
For example:
CREATE TABLE console.worker_stats
(
`id` Int64,
`gmt_create` DateTime,
`worker_id` Int64,
`cpu_stat` String,
`mem_stat` String,
`disk_stat` String,
`_sign` UInt8 DEFAULT 0,
`_version` UInt64 DEFAULT 0,
INDEX `_version_minmax_idx` `_version` TYPE minmax GRANULARITY 1
)
ENGINE = ReplacingMergeTree(`_version`,`_sign`)
ORDER BY id
SETTINGS index_granularity = 8192
Data Writing
In both data migration and synchronization, all DML statements are converted to INSERT statements, which are written in standard batches.
The
_version
field values increment according to the order of data changes.The
_sign
field values are set to 0 for Insert and Update statements, and 1 for Delete statements.
The two additional fields comply with the ClickHouse ReplacingMergeTree definition.
Step-by-step Guide
Only in several steps, the data can be moved from MySQL to ClickHouse with BladePipe.
Step 1: Install BladePipe
Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.
Step 2: Add DataSources
Log in to the BladePipe Cloud.
Click DataSource > Add DataSource, and add 2 DataSources.
Step 3: Create a DataJob
Click DataJob > Create DataJob.
Select the source and target DataSources, and click Test Connection to ensure the connection to the source and target DataSources are both successful.
In the Advanced configuration of the target DataSource, choose the table engine as ReplacingMergeTree (or ReplicatedReplacingMergeTree).
Select Incremental for DataJob Type, together with the Full Data option.
Note: In the Specification settings, make sure that you select a specification of at least 1 GB. Allocating too little memory may result in Out of Memory (OOM) errors during DataJob execution.
Select the tables and columns to be replicated.
Confirm the DataJob creation.
Now the data pipeline from MySQL to ClickHouse is created and will start in seconds, and BladePipe will automatically run the following DataTasks:
Schema Migration: The schemas of the source tables will be migrated to ClickHouse.
Full Data Migration: All existing data of the source tables will be fully migrated to ClickHouse.
Incremental Synchronization: Ongoing data changes will be continuously synchronized to the target database.
Step 4: Verify the Data
Now it’s time to check the results of data movement. Follow the steps to see how accurate the data is.
- Stop data write in the source database and wait for ClickHouse to merge data.
Note: Due to the unpredictable timing of ClickHouse’s automatic merging, you can manually trigger a merging by running the
OPTIMIZE TABLE xxx FINAL;
command. Note that there is a chance that this manual merging may not always succeed.Alternatively, you can run the
CREATE VIEW xxx_v AS SELECT * FROM xxx FINAL;
command to create a view and perform queries on the view to ensure the data is fully merged.
- Create a Verification DataJob. Once the Verification DataJob is completed, review the results to confirm that the data in ClickHouse is the same as that in MySQL.
Conclusion
BladePipe offers an innovative solution for MySQL-ClickHouse data synchronization. With BladePipe, you can easily create a data pipeline in a few steps and enjoy the fast and accurate data integration without toil. If you are interested and would like to have a try, please visit https://www.bladepipe.com for free trial.
Subscribe to my newsletter
Read articles from BladePipe directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
BladePipe
BladePipe
A real-time end-to-end data replication tool. Simplify data movement between 30+ databases, message queues, search engines and more, with ultra-low latency. Free trial at https://www.bladepipe.com