What Is Apache Iceberg? A Complete Guide (2026)
Apache Iceberg is an open table format for huge analytical datasets on object storage (S3, GCS, ADLS). It provides ACID transactions, schema evolution, partition evolution, and time travel on top of Parquet files — capabilities that raw Parquet or Hive tables lack. Created at Netflix, Iceberg is now the industry standard supported by Spark, Flink, Trino, Snowflake, BigQuery, and DuckDB.
Why Iceberg Matters
Traditional data lake files (Parquet on S3) have no transactions, no schema management, and no way to handle concurrent reads/writes safely. Iceberg adds a metadata layer that provides database-like guarantees:
| Capability | Raw Parquet | Hive Table | Apache Iceberg |
| ACID transactions | ❌ | ❌ | ✅ |
| Schema evolution | Manual | Limited | ✅ Full (add/drop/rename) |
| Partition evolution | Requires rewrite | Requires rewrite | ✅ In-place |
| Time travel | ❌ | ❌ | ✅ Snapshot-based |
| Concurrent writes | Unsafe | Limited | ✅ Optimistic concurrency |
| Column statistics | Per-file only | ❌ | ✅ Manifest-level |
How Iceberg Works
Iceberg stores data in Parquet files on object storage. A hierarchical metadata structure tracks which files belong to which table:
metadata.json → manifest list → manifest files → data files (Parquet)
Each write creates a new snapshot. Readers see a consistent view at any snapshot. Writers use optimistic concurrency — if two writers conflict, one retries.
Streaming Data into Iceberg
Stream processors (Flink, RisingWave) can write continuously to Iceberg tables:
-- RisingWave: Stream Kafka data to Iceberg
CREATE SINK events_to_iceberg AS SELECT * FROM events_stream
WITH (connector = 'iceberg', type = 'append-only', catalog.type = 'rest', ...);
RisingWave handles automatic compaction, solving the small files problem that plagues streaming ingestion.
Frequently Asked Questions
Is Apache Iceberg a database?
No. Iceberg is a table format — a specification for organizing data files and metadata on object storage. You still need a query engine (Trino, Spark, DuckDB) to read data and a stream processor (Flink, RisingWave) or batch tool (Spark) to write data.
Why is Iceberg winning over Delta Lake and Hudi?
Iceberg has the broadest multi-engine support (Spark, Flink, Trino, Snowflake, BigQuery, DuckDB), vendor-neutral governance (Apache Foundation), and unique features like partition evolution. Delta Lake is stronger in Databricks environments; Hudi is stronger for streaming CDC ingestion.
Does RisingWave support Apache Iceberg?
Yes. RisingWave has native Iceberg sink support with 5 catalog types (REST, Hive, JDBC, Storage, S3 Tables), automatic compaction, and both Merge-on-Read and Copy-on-Write modes.

