Clickhouse Optimization

ClickHouse Optimization Tips I Learned Today
Today, I dove deep into ClickHouse — one of the fastest columnar OLAP databases out there — and picked up several performance-focused insights. Whether you're new to ClickHouse or using it in production, these optimization techniques can seriously level up your query speed and storage efficiency.
🧱 1. Columnar Database 101
ClickHouse is a column-based database, which means:
Only the columns needed for a query are read from disk.
This leads to:
Faster analytics queries
Better compression
Efficient storage
Pro tip: Don’t store JSON strings — instead, break them into real typed columns (or use Map
, Array
, Tuple
, Nested
).
🧮 2. Aggregation vs Join — What to Use?
Goal | Best Tool |
Summarize data (e.g., COUNT , SUM ) | Aggregation |
Combine multiple tables | Joins (use carefully) |
Joins can be slow, especially on large datasets. Whenever possible:
Use pre-aggregated tables
Or denormalize your schema
ClickHouse is built for speed, but joins are inherently heavier than columnar aggregations.
⚙️ 3. Use ORDER BY
Smartly
ClickHouse’s ORDER BY
is not just for sorting — it defines the primary key and how data is stored on disk.
Example:
ORDER BY (event_name, toDate(event_time), tag_id, muid)
This helps:
Speed up queries with filters or ranges on those fields
Improve compression
❌ You can’t change
ORDER BY
after creating a table.
✅ To update it, you must create a new table, copy the data, and rename it.
🛠️ 4. Adding Columns Efficiently
ClickHouse supports altering schemas dynamically:
ALTER TABLE events
ADD COLUMN device_type String DEFAULT 'unknown' AFTER event_name;
Things to know:
Columns can be added without rewriting existing data
Add a
DEFAULT
to avoid NULLsUse
AFTER
to logically order columns (not required for performance)
📦 5. Avoid Raw JSON — Use Real Columns or Map
Parsing JSON strings on the fly slows down queries:
-- Slower
SELECT JSONExtractString(json_col, 'user.city') FROM events;
-- Better: store 'user_city' as a column directly
If you need flexible schemas, use:
Map(String, String) or Nested structures
📉 6. Query Less, Compress More
ClickHouse compresses columnar data extremely well when:
You use consistent types (e.g.,
LowCardinality(String)
for enums or repeated strings)Your data is sorted via a good
ORDER BY
The better your ORDER BY
, the more compression you get.
🧪 What I’ll Explore Next
Materialized Views
for pre-aggregationPARTITION BY
for large time-series dataClickHouse performance tuning flags and
system
tables
💬 Final Thoughts
ClickHouse is blazing fast — but only if you design your tables and queries thoughtfully. Columnar magic only works if you think in columns and optimize for reads.
If you're using ClickHouse or exploring data warehousing, follow me for more deep dives, breakdowns, and real-world use cases! 💻✨
Subscribe to my newsletter
Read articles from DIPIKESH KUMAR directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
