Introduction
Every team that streams data into Apache Iceberg hits the same wall: small files. A streaming job that commits every 30 seconds generates 2,880 commits per day. Each commit writes data files, and those files are often far smaller than the optimal 256 MB to 512 MB range that query engines prefer. Within a week, you have tens of thousands of tiny Parquet files, and your once-fast analytical queries now crawl.
The small file problem is not a bug. It is a direct consequence of how streaming works: low-latency commits produce many small writes, and Iceberg faithfully tracks every one. The solution is compaction, the process of rewriting many small files into fewer large ones while maintaining data consistency.
This guide covers the three main compaction strategies for Iceberg (bin-packing, sort-order, and z-order), explains when to use each for streaming workloads, and shows how RisingWave handles compaction automatically so you can focus on building your pipeline instead of maintaining it.
Why Does Streaming Create So Many Small Files in Iceberg?
To understand compaction, you first need to understand why the problem exists. Iceberg writes data files at commit time. Each commit produces one or more Parquet (or ORC) files. The size of these files depends on how much data accumulates between commits.
The math of streaming commits
Consider a streaming pipeline that ingests clickstream data at 10,000 events per second, with each event averaging 500 bytes:
- Data rate: 10,000 events/sec x 500 bytes = 5 MB/sec
- Commit interval: 30 seconds
- Data per commit: 5 MB/sec x 30 sec = 150 MB per commit (uncompressed)
- Parquet compression (typically 3:1 to 5:1): ~30 to 50 MB per commit
- Partitioning: If you partition by hour, each commit writes to the current hour partition. But if you partition by a higher-cardinality field (like region or event type), the data splits across multiple partitions, producing even smaller files per partition.
At 50 MB per commit (single partition), you are below the optimal 256 MB file size. With multiple partitions, each file might be 5 to 10 MB. Over 24 hours, that is 2,880 commits producing potentially tens of thousands of files.
Why small files hurt performance
Small files degrade query performance in three ways:
Metadata overhead: Each data file has an entry in the manifest. More files mean larger manifests, which increases query planning time. A table with 100,000 files takes significantly longer to plan than one with 1,000 files.
I/O overhead: Object storage (S3, GCS, Azure Blob) charges per request and has per-request latency overhead. Opening 10,000 small files is much slower than opening 100 large files, even if the total data volume is identical.
Reduced column statistics effectiveness: Iceberg stores min/max statistics per column per file. Larger files have broader value ranges, which reduces the effectiveness of predicate pushdown. However, too many small files create so much overhead that the theoretical precision advantage is lost.
The delete file problem
If your streaming pipeline handles updates or deletes (common with CDC workloads), Iceberg writes equality delete files or positional delete files. These accumulate alongside data files and add read amplification: every query must check delete files to determine which rows are still valid. RisingWave's compaction resolves delete files by merging them into the base data files, eliminating this overhead.
What Is Bin-Packing Compaction and When Should You Use It?
Bin-packing is the default and simplest compaction strategy in Iceberg. It groups small files together into larger files without changing the data order within those files.
How bin-packing works
The bin-packing algorithm:
- Scans the table for data files smaller than the target file size (default: 512 MB).
- Groups small files from the same partition into bins that, combined, approach the target size.
- Reads the data from the small files and writes it into new, larger files.
- Atomically commits the new files and marks the old files for deletion.
-- Run bin-packing compaction via Spark
CALL catalog.system.rewrite_data_files(
table => 'analytics.clickstream',
strategy => 'binpack',
options => map(
'target-file-size-bytes', '536870912', -- 512 MB
'min-file-size-bytes', '67108864', -- 64 MB (files smaller than this get compacted)
'max-file-size-bytes', '805306368' -- 768 MB
)
);
When to use bin-packing
- Append-only streaming: When your pipeline only inserts new data and write order carries no special significance.
- Quick compaction cycles: Bin-packing is the fastest strategy because it does not sort data. This matters when you need to compact frequently to keep up with streaming volume.
- Mixed partition cardinality: When different partitions have different file sizes, bin-packing efficiently handles the variety without requiring sort configuration.
Performance characteristics
Bin-packing compaction is I/O-bound (read small files, write large files) with minimal CPU overhead. It does not improve query performance beyond reducing file count and I/O operations. If your queries filter on specific columns, sort-order or z-order compaction provides additional benefits.
What Is Sort-Order Compaction and How Does It Help Queries?
Sort-order compaction rewrites files so that rows are physically ordered by one or more columns. This clustering effect dramatically improves query performance for predicates on the sort columns because entire files can be skipped based on min/max statistics.
How sort-order compaction works
- Reads data files from the target partition(s).
- Sorts all rows by the specified column(s).
- Writes sorted data into new files of the target size.
- Updates the table's metadata to reflect the new file boundaries and statistics.
-- Define a sort order on the Iceberg table
ALTER TABLE analytics.clickstream
WRITE ORDERED BY event_time, user_id;
-- Run sort compaction
CALL catalog.system.rewrite_data_files(
table => 'analytics.clickstream',
strategy => 'sort',
options => map(
'target-file-size-bytes', '536870912',
'rewrite-all', 'true'
)
);
Why sorting matters for streaming data
Streaming data arrives roughly ordered by time, but not perfectly. Events from different regions, delayed mobile clients, or out-of-order Kafka partitions create temporal gaps. After sort compaction by event_time, a query like:
SELECT COUNT(*), event_type
FROM analytics.clickstream
WHERE event_time >= '2026-03-28 00:00:00'
AND event_time < '2026-03-29 00:00:00'
GROUP BY event_type;
This query can skip all files whose event_time max is before 2026-03-28 or whose event_time min is after 2026-03-29. Without sorting, every file in the partition might overlap the query range, forcing a full scan.
When to use sort-order compaction
- Time-series queries: When most queries filter on a timestamp column, sorting by that column provides the biggest benefit.
- Known query patterns: When you know which columns appear most frequently in WHERE clauses.
- After initial bin-packing stabilization: Some teams run bin-packing frequently (every hour) to reduce file count quickly, then run sort compaction less frequently (daily or weekly) to optimize query performance.
Trade-offs
Sort compaction is more expensive than bin-packing because it requires a full sort operation. For large tables, this can consume significant CPU and memory. It also does not help queries that filter on columns other than the sort key.
What Is Z-Order Compaction and When Is It Worth the Cost?
Z-order compaction extends sorting to multiple dimensions simultaneously. Instead of sorting by column A then column B (which favors column A queries), z-order interleaves the binary representations of values from multiple columns into a single sortable value. This provides balanced pruning across all z-ordered columns.
How z-order compaction works
Z-order uses a space-filling curve to map multi-dimensional data to a single dimension:
- For each row, take the binary representations of the z-order columns.
- Interleave the bits to create a z-value.
- Sort rows by z-value.
- Write sorted data into files.
The result is that rows with similar values across all z-order columns tend to end up in the same file, enabling file pruning on any combination of those columns.
-- Z-order compaction via Spark
CALL catalog.system.rewrite_data_files(
table => 'analytics.clickstream',
strategy => 'sort',
sort_order => 'zorder(event_time, user_id, region)',
options => map(
'target-file-size-bytes', '536870912'
)
);
When z-order outperforms sort-order
Consider a table with queries that filter on different column combinations:
-- Query 1: Filter by time
SELECT * FROM clickstream WHERE event_time BETWEEN '2026-03-28' AND '2026-03-29';
-- Query 2: Filter by user
SELECT * FROM clickstream WHERE user_id = 12345;
-- Query 3: Filter by time AND region
SELECT * FROM clickstream WHERE event_time >= '2026-03-28' AND region = 'us-east';
With sort-order compaction on (event_time, user_id), Query 1 benefits from pruning, but Query 2 gets almost no benefit (user_id values are spread across all files within each time range). With z-order on (event_time, user_id, region), all three queries benefit from file pruning.
When to use z-order compaction
- Multi-dimensional query patterns: When users query on different column combinations and no single sort order covers all patterns.
- Ad-hoc analytics: When the query patterns are not predictable in advance.
- Spatial or geospatial data: Z-order naturally handles 2D (latitude, longitude) queries.
Trade-offs
Z-order compaction is the most expensive strategy:
- Higher CPU cost: Computing z-values and sorting is more intensive than single-column sorting.
- Compromise on each dimension: Because z-order balances across all columns, it is less effective than sort-order for queries that only filter on a single column.
- Diminishing returns above 3-4 columns: The effectiveness of z-order decreases as you add more columns. Stick to 2-4 high-value columns.
How Does RisingWave Handle Iceberg Compaction Automatically?
Managing compaction as a separate batch process adds operational complexity. You need to schedule Spark or Flink jobs, monitor their execution, handle failures, and tune parameters. RisingWave takes a different approach by running compaction as a built-in background process within the streaming engine itself.
Two-branch architecture
RisingWave uses a two-branch architecture for Iceberg writes:
- Ingestion branch: The streaming writer appends data to an ingestion branch. This isolates write operations from compaction.
- Main branch: The compactor reads from the ingestion branch, compacts files, resolves delete files, and atomically commits clean snapshots to the main branch.
This separation means the writer is never blocked by compaction, and readers on the main branch always see optimized data.
Built-in compaction service
RisingWave implements a schedulable compaction service built on Apache DataFusion. It runs in the background and periodically:
- Merges equality delete files into base data files (eliminating read amplification)
- Combines small data files into larger ones (bin-packing)
- Removes orphan files and expired snapshots
-- In RisingWave: create a streaming sink to Iceberg with automatic compaction
CREATE SINK sensor_iceberg FROM sensor_readings_mv WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'https://your-catalog-endpoint',
warehouse = 'iot_warehouse',
database.name = 'telemetry',
table.name = 'sensor_readings',
commit_checkpoint_interval = 60,
primary_key = 'sensor_id, reading_time'
);
-- RisingWave automatically compacts the resulting Iceberg files
-- No separate Spark job needed
For a detailed walkthrough of setting up streaming Iceberg pipelines, see the RisingWave Iceberg documentation.
Performance comparison
| Approach | Setup complexity | Compaction latency | Resource efficiency |
| Spark batch job (hourly) | High (scheduling, monitoring) | 1 hour+ | Separate cluster needed |
| Flink compaction task | Medium (Flink job management) | Minutes | Shared with streaming job |
| RisingWave built-in | Low (automatic) | Minutes | Embedded, no extra cluster |
| AWS S3 Tables compaction | Low (managed) | Minutes | Managed, AWS only |
How Should You Monitor Compaction Effectiveness?
Running compaction is not enough. You need to monitor whether it is keeping up with your streaming write rate and actually improving query performance.
Key metrics to track
File count per partition: The primary indicator. If file count grows continuously, compaction is not keeping up. Target: stable or slowly growing between compaction cycles.
Average file size: Should stay close to your target file size (typically 256 MB to 512 MB). If the average is below 64 MB, you need more frequent or aggressive compaction.
Manifest file count: Too many manifest files slow query planning. Monitor the number of manifests and run
rewrite_manifestswhen needed.Delete file ratio: For CDC workloads, track the ratio of delete files to data files. A high ratio indicates delete files are accumulating faster than compaction resolves them.
Query latency trends: Ultimately, compaction effectiveness shows up in query performance. Track p50 and p99 query latencies for your most common queries.
-- Check file statistics for an Iceberg table (Spark)
SELECT
partition,
COUNT(*) AS file_count,
AVG(file_size_in_bytes) / 1048576 AS avg_file_size_mb,
SUM(file_size_in_bytes) / 1073741824 AS total_size_gb,
SUM(record_count) AS total_records
FROM catalog.analytics.clickstream.files
GROUP BY partition
ORDER BY file_count DESC;
Compaction scheduling guidelines
| Streaming throughput | Bin-pack frequency | Sort/Z-order frequency | Snapshot expiration |
| Low (< 1 MB/sec) | Every 4 hours | Weekly | Daily |
| Medium (1-10 MB/sec) | Hourly | Daily | Every 12 hours |
| High (10-100 MB/sec) | Every 15 minutes | Every 6 hours | Every 6 hours |
| Very high (> 100 MB/sec) | Continuous (use RisingWave) | Daily | Every 4 hours |
What Are Best Practices for Iceberg Compaction in Streaming?
Based on production experience across streaming lakehouse deployments, here are the practices that matter most:
1. Start with bin-packing, add sorting later
Bin-packing provides 80% of the compaction benefit at 20% of the cost. Get file sizes under control first, then add sort-order compaction for your most queried columns.
2. Match commit interval to data volume
If your streaming pipeline produces less than 64 MB per commit, increase the commit interval. There is no value in committing 5 MB files every 10 seconds when a 50 MB file every 2 minutes works better. RisingWave's commit_checkpoint_interval parameter controls this.
3. Compact within partitions, not across them
Compaction should operate within partition boundaries. Cross-partition compaction can move data between files in ways that break partition pruning assumptions.
4. Expire snapshots aggressively
Each compaction creates new snapshots. If you do not expire old snapshots, metadata grows unboundedly. Set snapshot expiration to keep only what you need for time travel (for example, 7 days).
-- Expire snapshots older than 7 days
CALL catalog.system.expire_snapshots(
table => 'analytics.clickstream',
older_than => TIMESTAMP '2026-03-22 00:00:00',
retain_last => 100
);
5. Monitor and alert on file count
Set alerts when per-partition file count exceeds your threshold (for example, 10,000 files). This catches situations where compaction fails silently or cannot keep up with write volume.
6. Use separate compute for compaction
If using Spark or Flink for compaction, do not run compaction jobs on the same cluster as your streaming ingestion. Compaction is CPU and I/O intensive and can starve streaming jobs of resources. RisingWave's embedded approach avoids this by running compaction in a separate thread pool.
FAQ
What is Apache Iceberg compaction?
Iceberg compaction is the process of rewriting many small data files into fewer, larger files to improve query performance and reduce metadata overhead. It is essential for streaming workloads that create many small files through frequent commits.
How often should I compact Iceberg tables with streaming data?
The frequency depends on your write throughput. For medium throughput (1-10 MB/sec), compact every hour with bin-packing and daily with sort-order. For high throughput, consider continuous compaction with tools like RisingWave that handle compaction automatically.
What is the difference between bin-packing and z-order compaction?
Bin-packing groups small files into larger ones without reordering data, making it fast but only reducing I/O overhead. Z-order compaction reorders data across multiple columns simultaneously, enabling file pruning for queries that filter on any combination of those columns. Z-order is more expensive but provides better query performance for multi-dimensional filtering.
Does RisingWave handle Iceberg compaction automatically?
Yes. RisingWave includes a built-in compaction service that runs as a background process. It automatically merges small files, resolves equality delete files, and manages snapshots without requiring a separate Spark or Flink cluster.
Conclusion
Compaction is not optional for streaming Iceberg workloads. Without it, query performance degrades, storage costs increase, and metadata management becomes a bottleneck. Here are the key takeaways:
- Bin-packing is the fastest and simplest strategy. Use it as your baseline for all streaming tables.
- Sort-order compaction provides significant query speedups when your access patterns consistently filter on specific columns, especially timestamps.
- Z-order compaction is best for ad-hoc analytics with unpredictable multi-column filtering patterns. Limit it to 2-4 columns.
- Automate everything: Schedule compaction, snapshot expiration, and manifest rewriting. Better yet, use a tool like RisingWave that handles these tasks automatically.
- Monitor continuously: Track file count, average file size, and query latency to ensure compaction keeps pace with your streaming volume.
Ready to try this yourself? RisingWave provides automatic Iceberg compaction built into the streaming engine. No Spark clusters, no cron jobs. Try RisingWave Cloud free, no credit card required. Sign up here.
Join our Slack community to ask questions and connect with other stream processing developers.

