⚡️ Fast Replacing of Millions of Rows Daily in ClickHouse – A Partitioned Merge Tree Solution

🚨 The Challenge:
We had to replace millions of rows daily, covering data from the past 60 days, and run aggregations on top. Initially, we explored:

  • AggregatingMergeTree: Worked well only with incremental data, but not suited for full data replacements.

  • ReplacingMergeTree: Works with row replacement, but using FINAL made queries unbearably slow due to heavy merges.

💡 The Solution: Partitioning with MergeTree

We pivoted to using a plain MergeTree with monthly partitioning on the downloaded date. Here’s the smart trick we used:

  1. Create a new staging table with fresh data for the last 2 months.

  2. Exchange partitions from the staging table to the main table using ALTER TABLE … EXCHANGE PARTITION.

  3. Old partitions get dropped, and new ones swapped in — updates happen in milliseconds.

  4. Query performance remains lightning fast due to partition pruning.

📈 Benefits:

  • ✅ No need to rely on FINAL or deduplication.

  • ✅ Updates are atomic and fast.

  • ✅ Scalable and clean data management.

  • ✅ SELECT queries are fast due to smart partitioning.

🛠 Sample Partition Exchange Query:

sqlCopyEditALTER TABLE main_table
EXCHANGE PARTITION '2025-06' WITH TABLE staging_table;

✨ Conclusion:
Partition-based data replacement is a game changer when dealing with bulk updates in ClickHouse. It avoids performance penalties and keeps your queries fast.

clickhouse_partition_swap_demo.sql

sqlCopyEdit-- Step 1: Create the main table partitioned by month
DROP TABLE IF EXISTS main_table;

CREATE TABLE main_table
(
    id UInt32,
    downloaded_date Date,
    value String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(downloaded_date)
ORDER BY id;

-- Step 2: Insert sample old data
INSERT INTO main_table VALUES 
(1, '2025-06-10', 'OldData1'),
(2, '2025-06-11', 'OldData2'),
(3, '2025-07-10', 'OldData3');

-- Step 3: Create a staging table with fresh data
DROP TABLE IF EXISTS staging_table;

CREATE TABLE staging_table
(
    id UInt32,
    downloaded_date Date,
    value String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(downloaded_date)
ORDER BY id;

-- Step 4: Insert new data into the staging table
INSERT INTO staging_table VALUES 
(1, '2025-06-10', 'NewData1'),
(2, '2025-06-11', 'NewData2'),
(3, '2025-07-10', 'NewData3');

-- Step 5: Exchange partitions for June and July
ALTER TABLE main_table EXCHANGE PARTITION '202506' WITH TABLE staging_table;
ALTER TABLE main_table EXCHANGE PARTITION '202507' WITH TABLE staging_table;

-- Step 6: Final result check
SELECT * FROM main_table ORDER BY downloaded_date, id;

📝 Notes:

  • This simulates replacing June and July 2025 data.

  • The old data gets swapped out with new data from staging_table in milliseconds.

  • This works well even for millions of rows, as ClickHouse handles partitions efficiently.

1
Subscribe to my newsletter

Read articles from Kiruthiga Jaiganesh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kiruthiga Jaiganesh
Kiruthiga Jaiganesh