Clickhouse Optimization

DIPIKESH KUMARDIPIKESH KUMAR
3 min read

ClickHouse Optimization Tips I Learned Today

Today, I dove deep into ClickHouse — one of the fastest columnar OLAP databases out there — and picked up several performance-focused insights. Whether you're new to ClickHouse or using it in production, these optimization techniques can seriously level up your query speed and storage efficiency.


🧱 1. Columnar Database 101

ClickHouse is a column-based database, which means:

Only the columns needed for a query are read from disk.

This leads to:

  • Faster analytics queries

  • Better compression

  • Efficient storage

Pro tip: Don’t store JSON strings — instead, break them into real typed columns (or use Map, Array, Tuple, Nested).


🧮 2. Aggregation vs Join — What to Use?

GoalBest Tool
Summarize data (e.g., COUNT, SUM)Aggregation
Combine multiple tablesJoins (use carefully)

Joins can be slow, especially on large datasets. Whenever possible:

  • Use pre-aggregated tables

  • Or denormalize your schema

ClickHouse is built for speed, but joins are inherently heavier than columnar aggregations.


⚙️ 3. Use ORDER BY Smartly

ClickHouse’s ORDER BY is not just for sorting — it defines the primary key and how data is stored on disk.

Example:

ORDER BY (event_name, toDate(event_time), tag_id, muid)

This helps:

  • Speed up queries with filters or ranges on those fields

  • Improve compression

❌ You can’t change ORDER BY after creating a table.
✅ To update it, you must create a new table, copy the data, and rename it.


🛠️ 4. Adding Columns Efficiently

ClickHouse supports altering schemas dynamically:

ALTER TABLE events
ADD COLUMN device_type String DEFAULT 'unknown' AFTER event_name;

Things to know:

  • Columns can be added without rewriting existing data

  • Add a DEFAULT to avoid NULLs

  • Use AFTER to logically order columns (not required for performance)


📦 5. Avoid Raw JSON — Use Real Columns or Map

Parsing JSON strings on the fly slows down queries:

-- Slower
SELECT JSONExtractString(json_col, 'user.city') FROM events;

-- Better: store 'user_city' as a column directly

If you need flexible schemas, use:

Map(String, String) or Nested structures

📉 6. Query Less, Compress More

ClickHouse compresses columnar data extremely well when:

  • You use consistent types (e.g., LowCardinality(String) for enums or repeated strings)

  • Your data is sorted via a good ORDER BY

The better your ORDER BY, the more compression you get.


🧪 What I’ll Explore Next

  • Materialized Views for pre-aggregation

  • PARTITION BY for large time-series data

  • ClickHouse performance tuning flags and system tables


💬 Final Thoughts

ClickHouse is blazing fast — but only if you design your tables and queries thoughtfully. Columnar magic only works if you think in columns and optimize for reads.


If you're using ClickHouse or exploring data warehousing, follow me for more deep dives, breakdowns, and real-world use cases! 💻✨


0
Subscribe to my newsletter

Read articles from DIPIKESH KUMAR directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

DIPIKESH KUMAR
DIPIKESH KUMAR