Apache Iceberg Partitioning Strategies for Streaming Data

The right Iceberg partition strategy can reduce query scan costs by 10–100x and prevent the small-file problem that degrades streaming pipeline performance. For streaming data written by RisingWave, partition by event time using Iceberg's hidden partitioning transforms—never by raw timestamp columns—and align partition granularity with your most common query patterns.

Why Partitioning Matters More for Streaming

Batch pipelines write large files infrequently. Streaming pipelines write small files constantly. Without careful partitioning, a streaming sink produces thousands of tiny Parquet files scattered across hundreds of partitions, turning simple analytical queries into S3 list operations that cost more than the actual data reads.

Iceberg's hidden partitioning is the solution. You define partition transforms (HOUR, DAY, MONTH, TRUNCATE, BUCKET) on columns, and Iceberg handles the physical layout transparently. Queries that filter on those columns automatically skip irrelevant partitions without requiring query writers to know the partition scheme.

Partition Strategy Comparison

Strategy	Write Amplification	Query Selectivity	Compaction Need	Best For
No partition	Low	None	High	< 1 GB tables
`DAY(event_time)`	Low	High (time-filtered)	Medium	Daily reporting
`HOUR(event_time)`	Medium	Very high	High (streaming)	Real-time monitoring
`MONTH(event_time)` + `BUCKET(user_id, 16)`	Low	High (multi-dim)	Low	Multi-tenant analytics
`TRUNCATE(region, 4)` + `DAY(event_time)`	Medium	High (geo+time)	Medium	Regional dashboards
`BUCKET(device_id, 64)`	High	High (device queries)	Low	IoT per-device queries

Setting Up a Well-Partitioned Streaming Pipeline

Step 1: Ingest Events from Kafka

CREATE SOURCE clickstream_events (
    event_id    VARCHAR,
    user_id     VARCHAR,
    region      VARCHAR,
    product_id  VARCHAR,
    event_type  VARCHAR,
    amount      DECIMAL(12,2),
    event_time  TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'clickstream.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Step 2: Pre-Aggregate Before Writing to Iceberg

Aggregation before the Iceberg sink reduces file count and partition spread dramatically. A 1-minute tumble window collapses per-event writes into per-minute summaries.

CREATE MATERIALIZED VIEW clickstream_minutely AS
SELECT
    w.region,
    w.product_id,
    w.event_type,
    w.window_start                AS minute_bucket,
    COUNT(*)                      AS event_count,
    COUNT(DISTINCT w.user_id)     AS unique_users,
    SUM(w.amount)                 AS total_amount
FROM TUMBLE(
    clickstream_events,
    event_time,
    INTERVAL '1 MINUTE'
) AS w
GROUP BY w.region, w.product_id, w.event_type, w.window_start;

Step 3: Sink to Iceberg with Explicit Partitioning Hints

When creating the Iceberg table via RisingWave sink, the table is created with the schema from the SELECT. Define the partition spec in the Iceberg catalog after initial creation, or pre-create the table with the desired spec.

CREATE SINK clickstream_iceberg AS
SELECT
    region,
    product_id,
    event_type,
    minute_bucket,
    event_count,
    unique_users,
    total_amount
FROM clickstream_minutely
WITH (
    connector       = 'iceberg',
    type            = 'append-only',
    catalog.type    = 'rest',
    catalog.uri     = 'http://iceberg-catalog:8181',
    warehouse.path  = 's3://analytics-lakehouse/warehouse',
    s3.region       = 'us-east-1',
    database.name   = 'clickstream',
    table.name      = 'minutely_summary'
);

For Iceberg table creation with partition spec, use Spark or Trino to pre-create the table:

-- Run in Spark or Trino to pre-create with partition spec
CREATE TABLE clickstream.minutely_summary (
    region       STRING,
    product_id   STRING,
    event_type   STRING,
    minute_bucket TIMESTAMP,
    event_count  BIGINT,
    unique_users BIGINT,
    total_amount DECIMAL(12,2)
)
USING iceberg
PARTITIONED BY (DAY(minute_bucket), region);

Evolving Your Partition Strategy

One of Iceberg's most powerful features is partition evolution—you can change the partition spec without rewriting existing data. Old files retain their original layout; new writes use the updated spec. Queries transparently handle both layouts.

This is critical for streaming pipelines that start with fine-grained partitioning (HOUR) during development and need to coarsen to DAY as data volume grows.

Handling Streaming Write Patterns

Streaming writes create the "small files problem" even with good partitioning. Each RisingWave checkpoint (default 60 seconds) produces at least one Parquet file per active partition. With 50 active regions and hourly partitions, that's 50 files per checkpoint—4,320 files per day per partition.

Mitigation strategies:

Increase checkpoint interval: For non-latency-sensitive sinks, set commit_checkpoint_interval = 300 (5 minutes). Reduces file count by 5x.
Aggregate more aggressively: A 5-minute tumble window instead of 1-minute reduces partition spread and file count by 5x.
Run compaction regularly: Schedule Iceberg's rewriteDataFiles to merge small files into 128–512 MB targets. Most production deployments run compaction every 15 minutes for hot partitions.
Use sorted compaction: Compaction that sorts by the most common filter column (e.g., region) enables Iceberg's min/max statistics to skip entire files, providing an additional 2–5x query speedup.

Querying Partition-Pruned Data

With proper partitioning, analytical queries only scan relevant partitions:

-- This query benefits from DAY(minute_bucket) + region partitioning
-- Iceberg skips all partitions outside 2026-03-01 to 2026-03-07 and outside 'US-WEST'
SELECT
    DATE_TRUNC('day', minute_bucket) AS day,
    product_id,
    SUM(event_count)                 AS daily_events,
    SUM(unique_users)                AS daily_users,
    SUM(total_amount)                AS daily_revenue
FROM clickstream.minutely_summary
WHERE minute_bucket BETWEEN TIMESTAMP '2026-03-01' AND TIMESTAMP '2026-03-07'
  AND region = 'US-WEST'
GROUP BY DATE_TRUNC('day', minute_bucket), product_id
ORDER BY daily_revenue DESC;

Without partitioning, this query scans the entire table. With DAY(minute_bucket) partitioning, it scans 7 daily partitions. With DAY(minute_bucket) + region, it scans only 7 partitions out of potentially hundreds of region-day combinations.

FAQ

Q: Should I partition by event time or by ingestion time? A: Almost always event time. Iceberg's partition pruning is based on the partition column values, and queries filter on event time (not when data was ingested). Using ingestion time means your business queries won't benefit from pruning.

Q: How many partitions is too many for streaming? A: There is no hard limit, but more than 1,000 active partitions per checkpoint creates metadata overhead. If your data has high cardinality in the partition key, use BUCKET transforms to cap partition count at a fixed number (e.g., 64 or 128 buckets).

Q: Can I change the partition strategy without downtime? A: Yes. Iceberg partition evolution is a metadata-only operation. Create a new partition spec with ALTER TABLE ... SET PARTITION SPEC. Existing data retains its original layout; new writes use the new spec.

Q: Does RisingWave respect Iceberg partition specs when writing? A: RisingWave's Iceberg sink writes data in the order it receives it. Iceberg's catalog handles partition routing based on the spec defined on the table. RisingWave does not need to be aware of the partition spec.

Q: What is the optimal Parquet file size for Iceberg? A: 128 MB to 512 MB is the standard target. Files smaller than 64 MB create excessive S3 list and metadata operations; files larger than 1 GB reduce the benefit of partition pruning for selective queries.

Get Started

Design your partition strategy and start streaming to Iceberg with the RisingWave quickstart guide. Share your partition layouts and get feedback in the RisingWave Slack community.