Apache Iceberg vs Delta Lake for Streaming Workloads

Introduction

If you are building a streaming data pipeline that lands data into a lakehouse, you face a choice between two dominant table formats: Apache Iceberg and Delta Lake. Both provide ACID transactions on object storage, schema evolution, and time travel. Both claim to handle streaming well. But the details matter, and the details diverge significantly when you look at how each format handles the specific challenges of continuous data ingestion.

This comparison focuses on what matters for streaming: how each format handles high-frequency writes, schema changes during continuous ingestion, partition management, small file compaction, and change data capture. We present both formats fairly, note where each wins, and explain how RisingWave integrates with Iceberg for teams building streaming lakehouses.

By the end, you will know which format fits your streaming architecture and where the tradeoffs lie.

How Do Iceberg and Delta Lake Handle Schema Evolution Differently?

Schema changes are inevitable in streaming pipelines. Upstream producers add fields, change types, or restructure messages. The table format must handle these changes without breaking active consumers.

Apache Iceberg

Iceberg treats schema evolution as a first-class concern with column-level tracking:

Column IDs: Each column gets a unique integer ID at creation. This ID persists even if the column is renamed. Readers map data files to the current schema using these IDs, not column names or positions.
Safe type promotion: Iceberg supports widening types (int to long, float to double) without rewriting data files. The promoted type applies only to new reads.
Add, drop, rename, reorder: All operations are metadata-only. No data files are rewritten. This is critical for streaming because a schema change takes effect instantly without pausing ingestion.
Nested schema support: Iceberg supports schema evolution within structs, maps, and lists. You can add a field to a nested struct without touching the parent schema.

-- Adding a column to an Iceberg table during active streaming ingestion
ALTER TABLE events ADD COLUMN device_type STRING AFTER user_agent;

-- Renaming a column without breaking existing data files
ALTER TABLE events RENAME COLUMN user_agent TO browser_agent;

Delta Lake

Delta Lake also supports schema evolution but with a different approach:

Column names as identifiers: Delta uses column names (not IDs) to track schema. Renaming a column requires enabling column mapping mode, which was added later.
Schema enforcement by default: Delta rejects writes that do not match the current schema unless you enable mergeSchema or overwriteSchema. This is a safety feature but requires explicit handling in streaming pipelines.
Type widening: Delta added type widening as a preview feature. It supports byte to short, short to int, int to long, float to double, and date to timestamp.

Verdict

Iceberg has a more mature and flexible schema evolution model, particularly for streaming. The column ID approach means schema changes never require coordination between writers and readers. Delta's name-based approach works well but requires more careful management during renames and reorganizations.

How Do Partitioning Strategies Compare for Streaming?

Partitioning determines how data files are organized in object storage. For streaming, the partitioning strategy directly impacts write throughput, query performance, and the severity of the small file problem.

Apache Iceberg: Hidden Partitioning

Iceberg uses a concept called hidden partitioning, where partition transforms are defined at the table level and applied automatically. Users write queries against the raw columns, and Iceberg applies the partition filter transparently.

-- Create a table with hourly partitioning on event_time
CREATE TABLE events (
    event_id BIGINT,
    user_id INT,
    event_type VARCHAR,
    event_time TIMESTAMP,
    payload VARCHAR
) PARTITIONED BY (hour(event_time));

Key advantages for streaming:

Partition evolution: You can change the partition scheme (for example, from hourly to daily) without rewriting existing data. Old files keep their original partitioning, and new files use the updated scheme. This is valuable when streaming volume changes over time.
No user-maintained partition columns: You do not need to add explicit partition columns to your data. The transform is applied at write time.
Partition pruning: Query engines automatically apply partition filters based on predicates on the source column.

Delta Lake: Explicit Partitioning and Liquid Clustering

Delta takes a more traditional approach to partitioning, requiring explicit partition columns. However, Delta recently introduced Liquid Clustering as a modern alternative:

Explicit partitioning: You specify partition columns directly. Changing the partition scheme requires rewriting the table.
Liquid Clustering: Introduced as a replacement for static partitioning and Z-ORDER. You define clustering keys, and Delta automatically organizes data. You can change clustering keys without rewriting existing data, which Delta then applies incrementally.

Verdict

For streaming, Iceberg's hidden partitioning and partition evolution provide more flexibility. You can start with hourly partitioning and switch to daily as your pipeline matures, without any downtime or data rewriting. Delta's Liquid Clustering is a strong response but is newer and still maturing in the open-source version.

How Does Change Data Capture (CDC) Support Compare?

CDC is essential for streaming workloads that need to capture inserts, updates, and deletes from upstream databases and apply them to lakehouse tables.

Apache Iceberg

Iceberg supports row-level updates and deletes through two mechanisms:

Copy-on-write (CoW): Rewrites the entire data file when any row is updated or deleted. Simple but expensive for high-frequency updates.
Merge-on-read (MoR): Writes delete files (either positional or equality deletes) alongside data files. Reads merge deletes at query time. More efficient for writes but adds read overhead.

For CDC streaming:

-- In RisingWave: stream CDC changes from PostgreSQL to Iceberg
CREATE SOURCE pg_orders WITH (
    connector = 'postgres-cdc',
    hostname = 'your-pg-host',
    port = '5432',
    username = 'replication_user',
    password = 'your_password',
    database.name = 'production',
    table.name = 'orders'
);

CREATE SINK orders_iceberg FROM pg_orders WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'order_id',
    catalog.type = 'rest',
    catalog.uri = 'https://your-catalog-endpoint',
    warehouse = 'lakehouse',
    database.name = 'production',
    table.name = 'orders'
);

Iceberg's incremental read API allows downstream consumers to read only new appended data since a given snapshot. However, reading incremental updates and deletes (not just appends) requires additional tooling.

Delta Lake

Delta provides the Change Data Feed (CDF) feature:

Change Data Feed: When enabled, Delta records change events (insert, update_preimage, update_postimage, delete) alongside data files. Consumers can read a stream of changes between two versions.
Built into Spark Structured Streaming: CDF integrates natively with Spark Structured Streaming as both a source and sink.

Verdict

Delta Lake has a more mature CDC consumption story through Change Data Feed, which provides a complete change stream including pre- and post-images. Iceberg's approach works well for append-heavy streaming but requires merge-on-read and additional tooling for full CDC roundtrips. RisingWave bridges this gap by handling CDC transformations in the streaming layer before sinking clean data to Iceberg.

How Do Compaction Strategies Differ?

Streaming ingestion creates many small files, often hundreds per hour. Without compaction, query performance degrades as engines must open thousands of small files instead of scanning a few large ones.

Apache Iceberg

Iceberg provides multiple compaction strategies through its rewrite_data_files action:

Bin-packing: Groups small files into larger ones without changing data order. Fastest option, ideal when write order matters.
Sort compaction: Rewrites files sorted by specified columns. Improves query performance for range predicates.
Z-order compaction: Interleaves values from multiple columns to enable efficient pruning across multiple dimensions.

Compaction runs as a separate process (typically Spark or Flink) and is decoupled from the write path. RisingWave runs compaction as a built-in background process, eliminating the need for a separate Spark job.

Delta Lake

Delta Lake provides the OPTIMIZE command:

Bin-packing: Similar to Iceberg, combines small files into target-sized files.
Z-ORDER BY: Orders data by multiple columns using z-order curves for multi-dimensional pruning.
Liquid Clustering: The newest approach, which incrementally reorganizes data based on clustering keys. Designed to replace both partitioning and Z-ORDER.
Auto-compaction: Databricks (but not open-source Delta) can automatically trigger compaction after writes.

Verdict

Both formats provide solid compaction capabilities. Iceberg's sort compaction and Z-order are well-established. Delta's Liquid Clustering is innovative but primarily available in the Databricks commercial offering. For open-source streaming pipelines, Iceberg compaction is more accessible and better supported by third-party tooling. RisingWave's automatic compaction is a significant advantage for teams that do not want to manage separate compaction infrastructure.

How Does Multi-Engine Support Compare?

Streaming architectures typically involve multiple engines: a streaming processor for ingestion, an OLAP engine for analytics, a notebook environment for data science, and possibly a BI tool for dashboards.

Apache Iceberg

Iceberg was designed from the start for multi-engine support:

Engine	Read	Write	Streaming
Apache Spark	Yes	Yes	Yes
Apache Flink	Yes	Yes	Yes
Trino/Presto	Yes	Yes	No
RisingWave	Yes	Yes	Yes
StarRocks	Yes	No	No
DuckDB	Yes	No	No
Snowflake	Yes	Yes	No
BigQuery	Yes	Yes	No

Delta Lake

Delta Lake started tightly coupled to Spark but has broadened through the Delta UniForm initiative:

Engine	Read	Write	Streaming
Apache Spark	Yes	Yes	Yes
Apache Flink	Yes (connector)	Yes (connector)	Yes
Trino/Presto	Yes	Limited	No
DuckDB	Yes	No	No
Snowflake	Yes (UniForm)	No	No
BigQuery	Yes (UniForm)	No	No

Verdict

Iceberg has broader native multi-engine support. Delta Lake is closing the gap through UniForm (which generates Iceberg metadata alongside Delta metadata), but this adds complexity and may lag behind native Iceberg features. If your architecture includes multiple query engines or a non-Spark streaming processor like RisingWave, Iceberg provides a smoother integration path.

How Does Time Travel Compare?

Time travel allows you to query data as it existed at a previous point in time. For streaming workloads, this is valuable for debugging, auditing, and reprocessing.

Apache Iceberg

Iceberg maintains a chain of snapshots. Each commit creates a new snapshot pointing to the updated set of data files. You can query any snapshot by ID or timestamp:

-- Query the table as it was at a specific timestamp
SELECT * FROM events FOR SYSTEM_TIME AS OF TIMESTAMP '2026-03-28 12:00:00';

-- Query by snapshot ID
SELECT * FROM events FOR SYSTEM_VERSION AS OF 1234567890;

Iceberg also supports rollback operations, reverting the table to a previous snapshot without deleting data files.

Delta Lake

Delta maintains a transaction log with numbered JSON files (and periodic Parquet checkpoints). Time travel works similarly:

-- Query by version number
SELECT * FROM events VERSION AS OF 42;

-- Query by timestamp
SELECT * FROM events TIMESTAMP AS OF '2026-03-28 12:00:00';

Delta's RESTORE command reverts the table to a previous version.

Verdict

Both formats provide robust time travel. The main difference is in snapshot management: Iceberg uses manifest-based snapshots while Delta uses a linear transaction log. For streaming workloads that generate many commits, Iceberg's snapshot expiration (expire_snapshots) and Delta's log cleanup (VACUUM) both need regular maintenance to prevent metadata bloat.

Feature Comparison Summary

Feature	Apache Iceberg	Delta Lake
Schema evolution	Column ID-based, fully flexible	Name-based, column mapping mode needed for renames
Partition evolution	Metadata-only, no rewrite	Requires rewrite (except Liquid Clustering)
Hidden partitioning	Yes	No (explicit partition columns)
CDC support	MoR with delete files	Change Data Feed (CDF)
Compaction	Bin-pack, sort, z-order	OPTIMIZE, Z-ORDER, Liquid Clustering
Auto-compaction	Via tools (RisingWave, etc.)	Databricks only (commercial)
Multi-engine support	Broad native support	Spark-first, expanding via UniForm
Time travel	Snapshot-based	Transaction log-based
Open governance	Apache Software Foundation	Linux Foundation (via Databricks)
Streaming ingestion	Flink, Spark, RisingWave	Spark Structured Streaming, Flink

When Should You Choose Iceberg Over Delta Lake for Streaming?

Choose Iceberg when:

You use multiple query engines beyond Spark (Trino, Flink, RisingWave, StarRocks)
You need partition evolution without data rewrites
You want vendor-neutral governance under the Apache Software Foundation
Your streaming pipeline uses RisingWave or Flink as the primary ingestion engine
You are building a streaming lakehouse architecture

Choose Delta Lake when:

Your streaming stack is centered on Spark Structured Streaming
You need Change Data Feed for downstream CDC consumption
You are already invested in the Databricks ecosystem
You want Liquid Clustering for automatic data organization (Databricks)

FAQ

What is the main difference between Iceberg and Delta Lake for streaming?

The main difference is in multi-engine support and partition management. Iceberg supports more query engines natively and allows partition evolution without data rewrites. Delta Lake has stronger Spark integration and a more mature Change Data Feed feature for CDC consumers.

Can I use both Iceberg and Delta Lake in the same architecture?

Yes. Delta UniForm can generate Iceberg-compatible metadata alongside Delta metadata, allowing Iceberg-compatible engines to read Delta tables. However, this adds complexity and may not support all Iceberg features. A cleaner approach is to pick one format and standardize.

How does RisingWave work with Apache Iceberg?

RisingWave is a streaming database that can ingest data from Kafka, CDC sources, or other streams, process it with SQL (including joins, aggregations, and windowing), and sink results directly into Iceberg tables. RisingWave also provides a managed Iceberg catalog with automatic compaction, eliminating the need for separate compaction infrastructure.

Is Apache Iceberg replacing Delta Lake?

Neither format is replacing the other. Both are actively developed and widely adopted. The market is trending toward interoperability (Delta UniForm, Apache XTable) rather than a single winner. However, Iceberg has broader multi-engine adoption and vendor-neutral governance, which gives it momentum in heterogeneous environments.

Conclusion

Both Apache Iceberg and Delta Lake are capable table formats for streaming lakehouses. The right choice depends on your specific architecture:

Iceberg leads in multi-engine support, partition evolution, and vendor-neutral governance.
Delta Lake leads in Spark ecosystem integration and Change Data Feed for CDC.
For streaming ingestion, both formats handle high-frequency writes well, but Iceberg's hidden partitioning and partition evolution give it an edge for evolving pipelines.
RisingWave simplifies Iceberg streaming by combining SQL-based stream processing, automatic compaction, and a managed REST catalog in a single platform.

Ready to try this yourself? Build a streaming lakehouse with RisingWave and Apache Iceberg. Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.