⚡️ Fast Replacing of Millions of Rows Daily in ClickHouse – A Partitioned Merge Tree Solution

🚨 The Challenge:
We had to replace millions of rows daily, covering data from the past 60 days, and run aggregations on top. Initially, we explored:
✅ AggregatingMergeTree: Worked well only with incremental data, but not suited for full data replacements.
✅ ReplacingMergeTree: Works with row replacement, but using
FINAL
made queries unbearably slow due to heavy merges.
💡 The Solution: Partitioning with MergeTree
We pivoted to using a plain MergeTree
with monthly partitioning on the downloaded date. Here’s the smart trick we used:
Create a new staging table with fresh data for the last 2 months.
Exchange partitions from the staging table to the main table using
ALTER TABLE … EXCHANGE PARTITION
.Old partitions get dropped, and new ones swapped in — updates happen in milliseconds.
Query performance remains lightning fast due to partition pruning.
📈 Benefits:
✅ No need to rely on
FINAL
or deduplication.✅ Updates are atomic and fast.
✅ Scalable and clean data management.
✅ SELECT queries are fast due to smart partitioning.
🛠 Sample Partition Exchange Query:
sqlCopyEditALTER TABLE main_table
EXCHANGE PARTITION '2025-06' WITH TABLE staging_table;
✨ Conclusion:
Partition-based data replacement is a game changer when dealing with bulk updates in ClickHouse. It avoids performance penalties and keeps your queries fast.
✅ clickhouse_partition_swap_demo.sql
sqlCopyEdit-- Step 1: Create the main table partitioned by month
DROP TABLE IF EXISTS main_table;
CREATE TABLE main_table
(
id UInt32,
downloaded_date Date,
value String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(downloaded_date)
ORDER BY id;
-- Step 2: Insert sample old data
INSERT INTO main_table VALUES
(1, '2025-06-10', 'OldData1'),
(2, '2025-06-11', 'OldData2'),
(3, '2025-07-10', 'OldData3');
-- Step 3: Create a staging table with fresh data
DROP TABLE IF EXISTS staging_table;
CREATE TABLE staging_table
(
id UInt32,
downloaded_date Date,
value String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(downloaded_date)
ORDER BY id;
-- Step 4: Insert new data into the staging table
INSERT INTO staging_table VALUES
(1, '2025-06-10', 'NewData1'),
(2, '2025-06-11', 'NewData2'),
(3, '2025-07-10', 'NewData3');
-- Step 5: Exchange partitions for June and July
ALTER TABLE main_table EXCHANGE PARTITION '202506' WITH TABLE staging_table;
ALTER TABLE main_table EXCHANGE PARTITION '202507' WITH TABLE staging_table;
-- Step 6: Final result check
SELECT * FROM main_table ORDER BY downloaded_date, id;
📝 Notes:
This simulates replacing June and July 2025 data.
The old data gets swapped out with new data from
staging_table
in milliseconds.This works well even for millions of rows, as ClickHouse handles partitions efficiently.
Subscribe to my newsletter
Read articles from Kiruthiga Jaiganesh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
