Optimizing Iceberg Table Performance – A Strategic Guide

In our previous chapters, we established that Apache Iceberg's intelligent metadata layer is the key to its reliability and performance. This metadata allows query engines to create smart plans and avoid costly full table scans. However, this is only the beginning.
True performance optimization is an active, ongoing process. The layout of your data files, the organization of data within those files, and your strategy for handling updates are all critical levers you can pull to enhance query speed and reduce compute costs.
This chapter is a deep dive into the practical strategies for tuning Iceberg tables. We will explore everything from fundamental table maintenance like compaction and sorting to advanced features like Puffin files and hidden partitioning, giving you the tools to make your Lakehouse not just reliable, but exceptionally fast and cost-efficient.
1. The Foundation: Compaction and Solving the "Small File Problem"
The single biggest threat to performance in any data lake is the "small file problem." Streaming ingestion or frequent, small updates can create thousands of tiny files. Querying this data is incredibly inefficient, as the engine incurs significant overhead opening, reading, and closing each file.
Compaction is the process of rewriting these small files into fewer, larger, and more optimal files (ideally 100MB - 1GB).
How Compaction Interacts with Iceberg Metadata
Crucially, compaction in Iceberg is a safe, transactional operation. It doesn't modify data in place. Instead, it works by creating a new table snapshot.
The compaction job reads a set of small files.
It writes their contents into one or more new, larger files.
It then commits a new snapshot to the table. This new snapshot points to the new, large files and logically removes the old, small files from the "current" view of the table.
This is committed atomically, meaning that concurrent reads are never disrupted and will see the old data until the moment the compaction transaction completes.
Visualizing the Impact: Before and After Snapshots
Let's see this in action. First, let's look at the history of a table that has just had several small streaming inserts.
-- View the table history BEFORE compaction
SELECT snapshot_id, parent_id, operation
FROM my_streaming_table.history;
-- Result:
-- | snapshot_id | parent_id | operation |
-- |---------------------|---------------------|-----------|
-- | 8333017788700497002 | 7327164675870333694 | append |
-- | 7327164675870333694 | 5139476312242609518 | append |
-- | 5139476312242609518 | ... | append |
We see a chain of append operations, each likely creating small files. Now, let's run a compaction job.
```sql
-- Run a simple bin-pack compaction
CALL catalog.system.rewrite_data_files(table => 'my_streaming_table');
-- Now lets check the compaction again.
-- View the table history AFTER compaction
SELECT snapshot_id, parent_id, operation
FROM my_streaming_table.history;
-- Result:
-- | snapshot_id | parent_id | operation |
-- |---------------------|---------------------|-----------|
-- | 9988776655443322110 | 8333017788700497002 | rewrite | <-- NEW SNAPSHOT
-- | 8333017788700497002 | 7327164675870333694 | append |
-- | 7327164675870333694 | ... | append |
A new snapshot with the rewrite operation has been added. All new queries will use this optimized snapshot with fewer files. The old snapshots (and their small files) are preserved for time travel until they are explicitly expired.
2. Intelligent Data Layout: Sorting and Z-Ordering
While compaction fixes file sizes, sorting organizes the data within those files. By sorting data based on frequently filtered columns, you allow the query engine to be far more effective at file pruning, scanning less data and therefore reducing costs.
When to Use Which Compaction Strategy: A Decision Guide
Choosing the right compaction strategy is a trade-off between the cost of the maintenance job and the benefit to your read queries.
Bin-Pack Strategy:
What it does: Combines files to reach the target size without any global sorting.
When to use it: When write speed is paramount and your primary goal is simply to fix the small file problem. Ideal for frequent maintenance on streaming tables to meet SLAs.
Cost Impact: Fastest and cheapest compaction jobs.
Sort Strategy:
What it does: Combines files while also sorting the data globally according to a specified key (e.g., ORDER BY team).
When to use it: When your read queries consistently filter on a specific, low-to-medium cardinality column. This co-locates related data, dramatically improving read performance.
Cost Impact: Slower, more expensive compaction jobs, but yields significant savings on read queries.
Z-Order Strategy:
What it does: Interleaves the data from multiple columns, creating a multi-dimensional sort order.
When to use it: For queries that have multiple, equally important filter columns (e.g., WHERE city = '...' AND product_category = '...').
Cost Impact: The most computationally expensive compaction strategy, but can provide massive performance gains for specific multi-filter query patterns.
Example: Running a Sort Compaction Job
This job compacts files and sorts all the data by team, ensuring all records for a single team are co-located in the fewest possible files.CALL catalog.system.rewrite_data_files( table => 'sports.nfl_players', strategy => 'sort', sort_order => 'team ASC NULLS LAST' );
3. The Ultimate Pruning Tool: Hidden Partitioning
Partitioning is the most powerful tool for pruning data, but traditional methods were cumbersome. Iceberg's hidden partitioning is a revolutionary improvement.
It allows you to define partitions based on a transform of a column, without creating a separate physical column. The query engine understands this relationship, so users can filter on the raw column while still getting the full benefit of partition pruning.
Example: Creating a Table with Hidden Partitioning
CREATE TABLE retail.orders (
order_id BIGINT,
customer_id BIGINT,
order_ts TIMESTAMP
)
USING iceberg
-- Partition by month, derived from the order_ts column.
-- The user never has to know about or query a separate 'order_month' column.
PARTITIONED BY (months(order_ts));
A user can now run SELECT * FROM retail.orders WHERE order_ts BETWEEN '2025-01-01' AND '2025-01-31', and Iceberg will automatically prune to scan only the partitions relevant to January 2025. Iceberg also supports partition evolution, allowing you to change your partitioning strategy over time without rewriting the entire table.
4. Advanced Tuning & Modern Innovations
Puffin Files: A recent innovation that decouples large statistics (like Bloom filters) into separate files. This allows statistics to be added or updated without rewriting the much larger data files, saving significant compute.
Bloom Filters: For high-cardinality columns where min/max statistics are not effective, Bloom filters can tell a query engine if a value definitively does not exist in a file, allowing it to be skipped. This avoids reading the entire file, saving significant I/O and compute.
5. Essential Housekeeping: Automation and Preventing "Cost Drift"
An optimized Lakehouse is a well-maintained one. Without regular maintenance, performance and costs will degrade over time—a phenomenon known as "cost drift."
Expiring Snapshots: Every write creates a new snapshot. Periodically run expire_snapshots to clean up old versions and allow the underlying data files to be physically deleted, reclaiming storage costs.
Removing Orphan Files: Failed write jobs can leave untracked files in your data directory. remove_orphan_files periodically scans for and deletes these files.
The Automation Endgame:
While you can script these jobs, the industry is moving towards managed services. Platforms like Amazon S3 Tables and Snowflake Iceberg Tables now handle tasks like compaction and snapshot expiration automatically. This is a powerful way to prevent cost drift and reduce operational overhead.
Quick Reference: Key Iceberg Tuning Properties
Here are some key table properties you can set to control performance behavior.
Property | Example Value | Purpose |
write.target.file-size-bytes | 536870912 (512MB) | Sets the target size for files written by compaction and other operations. |
write.distribution-mode | hash | Prevents small files during large parallel writes by ensuring records for the same partition go to the same writer. |
write.parquet.bloom-filter-enabled.column.<col_name> | true | Enables Bloom filter creation for a specific column to improve pruning. |
write.update.mode | merge-on-read | Changes the update strategy from Copy-on-Write to Merge-on-Read for write-heavy workloads. |
Final Thoughts
Performance tuning in Apache Iceberg is a systematic process, not a magic trick. By following a clear order of operations, you can achieve massive gains in speed and cost-efficiency.
Fix small files first. Use binpack compaction to solve the most urgent problem.
Then, optimize your file layout. Analyze your query patterns and use sort or z-order compaction and hidden partitioning to improve data pruning.
Then, layer on advanced indexing. Use features like Bloom filters for specific high-cardinality columns where needed.
And finally—automate it all. Whether through an orchestration tool or a managed service, automation is the key to preventing cost drift and maintaining a high-performance Lakehouse for the long term.
Subscribe to my newsletter
Read articles from Sriram Krishnan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Sriram Krishnan
Sriram Krishnan
Sharing lessons, tools & patterns to build scalable, modern data platforms —one post at a time.