How Often Should You Commit to Apache Iceberg?

How Often Should You Commit to Apache Iceberg?

For most streaming workloads, commit every 60 seconds. That single setting controls how fresh your Iceberg data is, how large your files grow, and how fast your catalog metadata bloats. Getting it wrong in either direction costs you in query performance, cloud spend, or operational headache. Here is how to find the right number for your use case.

What an Iceberg Commit Actually Does

Every time a writer commits to Apache Iceberg, it performs three things atomically: it writes new data files to object storage, writes a new manifest file listing those data files, and advances the snapshot pointer in the catalog's metadata.json.

That snapshot pointer advance is the visibility gate. Data written before the commit is invisible to readers. Data written after the commit is visible. There is no partial visibility, which is what makes Iceberg's ACID guarantees work.

The cost of this guarantee is that every commit touches the catalog. At 10 commits per minute across 50 tables, that is 500 metadata writes per minute. At scale, catalog pressure becomes a real bottleneck.

The Core Trade-off

Commit frequency is a direct dial between two opposing costs:

Shorter intervals reduce data freshness latency. Readers see new data sooner. But each commit produces at least one data file per partition. Short intervals with many partitions produce many small files. Small files hurt read performance because each requires a separate S3 GET. They also inflate the manifest list, which Iceberg must read in full to open a table.

Longer intervals let more data accumulate per commit. Each file is larger, reads are faster, and the manifest list grows more slowly. The downside is that data sits buffered in the writer's memory or local disk longer before becoming queryable.

There is no universally correct answer. The right interval depends on your throughput, partition count, latency requirements, and whether you have automatic compaction running.

Dimension 1: File Size

The most concrete way to reason about commit frequency is target file size.

A general rule for Iceberg is to target files between 128 MB and 512 MB for analytical read performance. Files below 32 MB start to hurt query planning. Files above 1 GB offer diminishing returns.

At 10 MB/s total throughput writing to a single partition:

  • 10-second commit interval: ~100 MB files. Manageable, but you accumulate files quickly over hours.
  • 60-second commit interval: ~600 MB files. Near-optimal for most query engines.
  • 5-minute commit interval: ~3 GB files. Larger than ideal; some engines struggle with oversized row groups.

With multiple partitions, divide throughput per partition. If you have 20 active partitions at 10 MB/s total, each partition sees 0.5 MB/s. A 60-second interval produces 30 MB files, which is already in the small-file danger zone. In that scenario, either increase the commit interval or rely on compaction to merge files after the fact.

Dimension 2: Catalog Metadata Pressure

Every commit appends to the manifest list. The manifest list is a file that Iceberg reads every time a table is opened. As the snapshot count grows, so does the manifest list, and so does table open latency.

With 60-second commits running 24 hours, you accumulate 1,440 snapshots per day per table. Without snapshot expiration and compaction, the manifest list for a week-old table contains 10,000+ entries. Opening that table requires reading and parsing all of them.

Iceberg's expire_snapshots procedure addresses this by removing old snapshot entries and their associated manifest files. But you need to run it regularly, or catalog bloat erases the query performance benefits you were trying to achieve with a short commit interval.

Fewer commits per day means less snapshot history to manage and a naturally smaller manifest list.

Dimension 3: S3 API Cost

Each commit triggers multiple S3 PUT requests: one or more for data files, one for the new manifest file, and one for the updated metadata.json.

At 60-second intervals writing to 10 tables, that is roughly 1,440 commits/day/table, or 14,400 commits across all tables. Each commit involves at minimum 3 PUT requests, so around 43,000 PUT requests per day. At AWS S3 pricing of $0.005 per 1,000 PUT requests, that is about $0.22 per day, or ~$80 per year, just for metadata writes on 10 tables.

At 10-second intervals, that cost multiplies by 6. Across hundreds of tables, S3 API cost from over-committing becomes meaningful.

Dimension 4: Query Freshness Latency

Commit frequency sets a lower bound on data freshness. If you commit every 60 seconds, the worst-case latency for a new row to become queryable is 60 seconds plus any upstream ingestion lag.

For most analytical workloads, 60 seconds is fine. Business dashboards refreshed every 5 minutes do not benefit from 10-second commit intervals.

For near-real-time alerting or monitoring dashboards, you might want 15-30 seconds. Below 10 seconds, the operational overhead rarely justifies the latency savings unless you have a specific SLA requirement.

Recommendations by Use Case

Use CaseRecommended IntervalRationale
Near-real-time monitoring / alerting15-30 secondsLow latency, compensate with compaction
Operational dashboards60 secondsBalanced file size and freshness
Analytics and reporting5-15 minutesLarger files, lower catalog pressure
Daily batch ingestion1-4 hoursMaximum file size efficiency
Archival / compliancePer batch jobWrite once, file size maximized

For any interval below 60 seconds, pair with automatic compaction. Without it, small files accumulate faster than manual maintenance can keep up.

Configuring Commit Frequency

The Flink Iceberg connector exposes sink.commit-interval to control how often the sink commits to Iceberg, independent of Flink's checkpoint interval.

CREATE TABLE iceberg_sink (
  id BIGINT,
  event_time TIMESTAMP(3),
  payload STRING
) WITH (
  'connector' = 'iceberg',
  'catalog-type' = 'rest',
  'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
  'uri' = 'http://iceberg-catalog:8181',
  'warehouse' = 's3://my-bucket/warehouse',
  'database-name' = 'events',
  'table-name' = 'raw_events',
  'sink.commit-interval' = '60000'  -- milliseconds, 60 seconds
);

Setting sink.commit-interval to a value larger than the Flink checkpoint interval means the sink batches multiple checkpoints into a single Iceberg commit. This is the key mechanism: decouple Flink's fault-tolerance checkpoint frequency from Iceberg's commit frequency.

For example, with Flink checkpointing every 10 seconds and sink.commit-interval set to 60 seconds, Flink checkpoints 6 times before each Iceberg commit. You get 10-second fault tolerance with 60-second Iceberg commit granularity.

# flink-conf.yaml
execution.checkpointing.interval: 10s
execution.checkpointing.mode: EXACTLY_ONCE
-- Override per-table for high-throughput sinks
'sink.commit-interval' = '300000'  -- 5 minutes for analytics tables

RisingWave with Iceberg Sink

RisingWave is a PostgreSQL-compatible streaming database written in Rust and released under the Apache 2.0 license. It natively supports writing to Apache Iceberg via a sink connector and provides the commit_checkpoint_interval parameter to decouple internal checkpoint frequency from Iceberg commit frequency.

CREATE SINK iceberg_events_sink
FROM events_source
WITH (
  connector = 'iceberg',
  type = 'append-only',
  warehouse.path = 's3://my-bucket/warehouse',
  s3.region = 'us-east-1',
  database.name = 'events',
  table.name = 'raw_events',
  catalog.type = 'rest',
  catalog.uri = 'http://iceberg-catalog:8181',
  commit_checkpoint_interval = 60
);

commit_checkpoint_interval = 60 means RisingWave commits to Iceberg once every 60 internal checkpoints. If RisingWave's checkpoint interval is set to 1 second, that produces an Iceberg commit every 60 seconds. If the checkpoint interval is 10 seconds, that produces an Iceberg commit every 600 seconds (10 minutes).

This design is intentional. RisingWave can maintain sub-second internal processing checkpoints for fault tolerance while batching those into much less frequent Iceberg commits. You get fast recovery without incurring Iceberg catalog pressure at every second.

-- For near-real-time: commit every 30 seconds
-- Assuming 1s internal checkpoint interval
commit_checkpoint_interval = 30

-- For analytics tables: commit every 5 minutes
commit_checkpoint_interval = 300

To check the current checkpoint interval in RisingWave:

SHOW checkpoint_interval;

How RisingWave Decouples Checkpoint and Commit Frequency

This decoupling deserves its own explanation because it is architecturally significant.

In Flink, checkpointing and committing to Iceberg are closely related. The Iceberg sink participates in Flink's two-phase commit protocol, which means Iceberg commits are tied to Flink checkpoint completion by default. The sink.commit-interval parameter overrides this with a time-based batching layer on top.

RisingWave takes a different approach. Its internal checkpoint mechanism is entirely separate from sink output commits. The system can checkpoint its internal state (materialized view state, operator state) every second for fast recovery, while the Iceberg sink accumulates data from multiple checkpoints and writes a single Iceberg commit on a separate cadence.

This means you do not have to choose between fast fault recovery and reasonable Iceberg commit frequency. You get both.

Monitoring Commit Health

Once you set a commit interval, monitor these signals to verify it is working correctly.

Files per commit: If a single commit is writing hundreds of small files (under 10 MB each), your throughput per partition is too low relative to your commit interval. Either increase the interval or reduce partition count.

Manifest list size: Query metadata.json for your table and check the manifest list file count. If it exceeds a few thousand entries, snapshot expiration is not running frequently enough.

S3 PUT request rate: A sudden spike in PUT requests typically indicates a commit interval that is too short or a partition explosion. Track this metric in your cloud provider's S3 dashboard.

Table open latency: If your query engine reports slow table planning times, it is often the manifest list size. Run CALL system.expire_snapshots(...) (Spark) or the equivalent for your engine.

For RisingWave, you can check the number of committed data files per sink checkpoint:

SELECT * FROM rw_sink_iceberg_statistics
WHERE sink_name = 'iceberg_events_sink'
ORDER BY committed_at DESC
LIMIT 20;

Running Compaction Alongside Streaming Ingestion

No commit interval setting eliminates the need for compaction. Even with 5-minute commits, a partitioned table with many active partitions will accumulate small files over days.

Run compaction during off-peak hours or on a scheduled basis using your query engine's rewrite procedure:

-- Spark
CALL system.rewrite_data_files(
  table => 'events.raw_events',
  strategy => 'binpack',
  options => map('target-file-size-bytes', '536870912')  -- 512 MB
);

-- Also expire old snapshots to trim the manifest list
CALL system.expire_snapshots(
  table => 'events.raw_events',
  older_than => TIMESTAMP '2026-04-16 00:00:00.000',
  retain_last => 100
);

Treat compaction as a maintenance task that runs independently of your commit interval, not as a replacement for choosing a reasonable commit interval.

Decision Framework

Start with these questions:

  1. What is your latency SLA? If data must be queryable within 30 seconds, set the commit interval to 15-20 seconds and budget for compaction overhead.
  2. What is your per-partition throughput? Multiply throughput (MB/s) by your target commit interval (seconds) to estimate file size. Aim for 128-512 MB per file.
  3. How many partitions are active at peak? More partitions mean more files per commit. Compensate with a longer interval or aggressive compaction.
  4. Do you have automatic compaction? If yes, you can safely use shorter intervals. If no, lean toward longer intervals to keep file sizes healthy.

For most teams starting with Iceberg streaming ingestion, 60 seconds is the right default. It produces reasonable file sizes at typical throughput levels, keeps catalog pressure manageable, and gives you a latency budget that covers the vast majority of analytical use cases.

FAQ

Does a shorter commit interval improve query accuracy?

No. Iceberg's snapshot isolation means readers see a consistent point-in-time view regardless of how often commits happen. Commit frequency affects data freshness latency (when new data becomes visible), not query correctness.

Can I change the commit interval without recreating the sink?

In RisingWave, you can ALTER SINK to update the commit_checkpoint_interval. In Flink, you typically need to restart the job with the updated configuration, though some deployment environments support dynamic configuration updates.

What happens if a commit fails mid-write?

Iceberg's atomic commit protocol ensures partial writes are never visible. If the metadata pointer update fails, the written data files are orphaned but not visible. You should run remove_orphan_files periodically to clean them up. Neither readers nor subsequent writers are affected by a failed commit.

Should I use one Iceberg table per stream or partition by event type?

Use a single table with event type as a partition column when event types share a schema. This reduces the number of tables you manage and the number of catalogs you need to keep in sync. Separate tables make sense when schemas diverge significantly or when different event types have different retention or access patterns.

How does commit frequency interact with time travel?

Each commit creates a snapshot. More commits mean finer-grained time travel. If you need to query data as it existed at a specific second, you need commits at at least that frequency. For most time travel use cases (recovering from a bad write, auditing changes), snapshot-per-minute granularity is more than sufficient.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.