Apache Iceberg

Streaming to Apache Iceberg with SQL

Stream data directly into Apache Iceberg tables using SQL. RisingWave transforms Kafka and CDC streams, then sinks to Iceberg with exactly-once semantics and sub-minute latency.

Try RisingWave Free Kafka to Iceberg

SQL

No Java Required

Write streaming pipelines in SQL, not Kafka Connect configs or Flink Java

Exactly-Once

Delivery Guarantee

Coordinated checkpointing ensures no duplicates or data loss in Iceberg tables

Transform+Sink

Single System

Filter, join, aggregate, and sink in one pipeline — no Spark or Airflow needed

Sub-Second

Processing Latency

Continuous streaming replaces hourly batch loads with always-fresh Iceberg data

Why Streaming

Why is streaming to Apache Iceberg better than batch loading?

Batch ETL jobs load data into Iceberg on hourly or daily schedules, creating stale data and bursty compute costs. Streaming pipelines deliver data continuously with sub-minute freshness, eliminate batch scheduling complexity, and spread compute evenly across time for predictable resource usage.

Factor	Batch ETL	Streaming (RisingWave)
Data Latency	Hours	Seconds to minutes
Data Freshness	Stale between runs	Always current
Compute Pattern	Bursty spikes	Steady, predictable
Complexity	Scheduler + retries + orchestration	Single SQL pipeline

→Eliminate Airflow DAGs and cron-based scheduling for Iceberg ingestion
→Reduce end-to-end data latency from hours to seconds
→Avoid large Spark cluster spin-ups for periodic batch loads
→Enable real-time dashboards and analytics on Iceberg data

How It Works

How does RisingWave stream data into Iceberg tables?

RisingWave ingests data from Kafka, CDC connectors, or other streaming sources using SQL. You define transformations as materialized views, then create an Iceberg sink that continuously writes results to your Iceberg catalog. No Java code, no Spark jobs, no orchestrators required.

Kafka to Iceberg

Consume Kafka topics and sink transformed results directly into Iceberg tables with a single SQL statement

CDC to Iceberg

Capture database changes via Debezium or direct CDC connectors and replicate them into Iceberg in real time

Transform + Sink

Apply SQL joins, aggregations, filters, and window functions before data lands in Iceberg

Exactly-Once Delivery

Coordinated checkpointing guarantees no duplicates or data loss in your Iceberg tables

Patterns

What streaming-to-Iceberg patterns does RisingWave support?

RisingWave supports append-only streaming, upsert patterns with primary keys, CDC replication from PostgreSQL and MySQL, multi-source joins before sinking, and time-partitioned writes. Each pattern is defined entirely in SQL with automatic state management and fault tolerance.

→Append-only streaming for event logs, clickstreams, and sensor data
→Upsert mode with primary keys for maintaining current-state tables in Iceberg
→CDC replication from PostgreSQL, MySQL, and MongoDB into Iceberg
→Multi-source joins that combine Kafka topics and database tables before sinking
→Time-partitioned writes using Iceberg hidden partitioning for optimized query performance
→Schema evolution support for adding and modifying columns without pipeline restarts

Frequently Asked Questions

Can RisingWave replace Kafka Connect for Iceberg?

Does RisingWave support Iceberg v2 features?

What catalogs does RisingWave work with?

How does exactly-once delivery to Iceberg work?

Ready to stream to Iceberg?

Start building streaming Iceberg pipelines with SQL in minutes.

Start Streaming to Iceberg

Kafka to Iceberg →Iceberg Materialized Views →