Apache Iceberg v3 introduces native variant and geometry types, row-level encryption, default column values, and multi-argument transforms — while remaining backward compatible with v2 readers for tables that don't use new features. For RisingWave users, the most immediately useful v3 additions are default column values (safe schema evolution) and the removal of sort order limitations.
Background: What Are Iceberg Format Versions?
Apache Iceberg's spec is versioned independently of any software release. A format version governs the structure of metadata files (manifests, manifest lists, table metadata JSON) and the semantics of operations. Format v1 introduced the foundational snapshot model. Format v2 added row-level deletes (enabling efficient upserts without full file rewrites). Format v3 is the latest spec, ratified in 2024.
Engines like Trino, Spark, Flink, and RisingWave each declare which format versions they can read and write. A v2 writer (like RisingWave's current stable release) can write to a v2 table; v3 tables require v3-capable engines for any v3-specific features.
Key Changes in Iceberg v3
1. Default Column Values
The most impactful change for streaming pipelines. In v2, adding a NOT NULL column to an existing table required a backfill — you had to rewrite every existing data file to include the new column. In v3, you can specify a default value for a new column:
-- In Trino/Spark, creating a v3 table with defaults
ALTER TABLE my_table ADD COLUMN discount_pct DOUBLE NOT NULL DEFAULT 0.0;
Existing data files return 0.0 for discount_pct without any rewrite. RisingWave pipelines writing to the table only need to include the new column going forward — old records are implicitly handled.
2. New Data Types: Variant and Geometry
Variant is a schema-less JSON/semi-structured type stored efficiently in Parquet. Instead of flattening nested JSON into dozens of columns, you can store the entire payload as a single Variant column and query into it with path expressions. This is analogous to Snowflake's VARIANT or BigQuery's JSON type.
Geometry enables native geospatial data storage (points, polygons, lines) with WKB encoding and CRS metadata. Geospatial analytics that previously required PostGIS or custom serialization can now use native Iceberg columns.
3. Row-Level Encryption
v3 adds an encryption spec that allows individual columns or rows to be encrypted at rest within the Parquet file, with key management integrated via the catalog. This is distinct from S3-side encryption (SSE-S3/SSE-KMS) — it encrypts the data before it reaches object storage, enabling column-level access control without separate masking infrastructure.
4. Multi-Argument Transforms
v2 partition transforms (bucket[N], truncate[W], year, month, day, hour) each take a single column. v3 introduces multi-argument transforms that can derive partition values from combinations of columns. This enables more precise data skipping for compound partition keys.
5. Binary Deletion Vectors
v3 replaces positional delete files with deletion vectors — compact bitsets stored directly in the data file's footprint. Deletion vectors are faster to apply at read time (O(1) per row vs. O(log n) merge join) and produce less catalog metadata overhead. This makes high-frequency upsert workloads significantly faster.
v2 vs. v3 Feature Comparison
| Feature | v2 | v3 |
| Row-level deletes | Equality + position delete files | Deletion vectors (faster) |
| Default column values | Not supported | Supported |
| Variant type | Not supported | Supported |
| Geometry type | Not supported | Supported |
| Row-level encryption | Not supported | Supported |
| Multi-argument transforms | Not supported | Supported |
| Backward compatibility | v1 readable | v2 readable (for base features) |
| Engine support (2025) | Broad | Growing |
What This Means for RisingWave Users
Today: RisingWave writes Iceberg v2 tables by default. The upsert mode uses v2 equality delete files, which are well-supported by all major query engines (Trino, Spark, Athena, DuckDB).
Migration path: When RisingWave adds v3 support, the upgrade path will be straightforward — Iceberg's spec allows in-place format version upgrades via a single metadata operation. Existing data files remain unchanged.
Default values (v3 benefit): The default column values feature is particularly valuable for RisingWave pipelines because it means you can add columns to downstream Iceberg tables without restarting or reconfiguring your sinks. The new column simply starts appearing in new writes, while historical rows return the default.
Deletion vectors (v3 benefit): For high-velocity CDC pipelines (e.g., replicating a busy PostgreSQL orders table), deletion vectors will significantly reduce the read-time merge overhead when querying upserted Iceberg tables.
Working with v2 Tables in RisingWave Today
RisingWave's current v2 sink is production-ready for the vast majority of use cases:
-- Create a standard v2 upsert pipeline
CREATE SOURCE inventory_cdc (
item_id BIGINT PRIMARY KEY,
warehouse_id BIGINT,
quantity INT,
last_updated TIMESTAMPTZ
)
WITH (
connector = 'postgres-cdc',
hostname = 'postgres.internal',
port = '5432',
username = 'cdc_user',
password = 'secret',
database.name = 'inventory',
table.name = 'stock_levels'
)
FORMAT DEBEZIUM ENCODE JSON;
CREATE MATERIALIZED VIEW inventory_current AS
SELECT
item_id,
warehouse_id,
quantity,
last_updated,
CASE WHEN quantity = 0 THEN 'OUT_OF_STOCK'
WHEN quantity < 10 THEN 'LOW_STOCK'
ELSE 'IN_STOCK'
END AS stock_status
FROM inventory_cdc;
CREATE SINK inventory_iceberg_sink AS
SELECT * FROM inventory_current
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'item_id,warehouse_id',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://my-warehouse/data',
s3.region = 'us-east-1',
database.name = 'supply_chain',
table.name = 'inventory'
);
This pipeline works today with v2 semantics — exactly-once upserts with equality delete files. When v3 is available, the deletion vectors will make this faster without any change to the SQL.
FAQ
Q: Should I upgrade existing Iceberg tables to v3 today? A: Only if you need a v3-specific feature (variant type, encryption, defaults). For most workloads, v2 is stable and has broader engine support. Check your query engine's v3 compatibility before upgrading.
Q: Is v3 backward compatible? Can v2 engines read v3 tables? A: v2 engines can read v3 tables that only use v2 features. If a v3 table uses deletion vectors, variant columns, or other v3-only features, v2 engines will fail. Iceberg's spec includes a "supported features" field that engines check before reading.
Q: When will RisingWave support Iceberg v3? A: Check the RisingWave changelog and GitHub roadmap for the latest status. The community is actively tracking spec v3 adoption. Join the Slack for real-time updates.
Q: Does v3 change the catalog protocol? A: No. The REST catalog API (Iceberg REST Catalog spec) is independent of the table format version. v2 and v3 tables are managed through the same catalog endpoints.
Q: How do deletion vectors compare to Hudi's merge-on-read? A: Both are merge-on-read strategies. Iceberg v3 deletion vectors are more compact and faster to apply than Hudi's delta log approach, and they integrate natively with the Parquet file format rather than requiring a separate log structure.
Get Started
Stay ahead of the Iceberg spec evolution with RisingWave:

