Apache Iceberg
Stream data directly into Apache Iceberg tables using SQL. RisingWave transforms Kafka and CDC streams, then sinks to Iceberg with exactly-once semantics and sub-minute latency.
Why Streaming
Batch ETL jobs load data into Iceberg on hourly or daily schedules, creating stale data and bursty compute costs. Streaming pipelines deliver data continuously with sub-minute freshness, eliminate batch scheduling complexity, and spread compute evenly across time for predictable resource usage.
| Factor | Batch ETL | Streaming (RisingWave) |
|---|---|---|
| Data Latency | Hours | Seconds to minutes |
| Data Freshness | Stale between runs | Always current |
| Compute Pattern | Bursty spikes | Steady, predictable |
| Complexity | Scheduler + retries + orchestration | Single SQL pipeline |
How It Works
RisingWave ingests data from Kafka, CDC connectors, or other streaming sources using SQL. You define transformations as materialized views, then create an Iceberg sink that continuously writes results to your Iceberg catalog. No Java code, no Spark jobs, no orchestrators required.
Consume Kafka topics and sink transformed results directly into Iceberg tables with a single SQL statement
Capture database changes via Debezium or direct CDC connectors and replicate them into Iceberg in real time
Apply SQL joins, aggregations, filters, and window functions before data lands in Iceberg
Coordinated checkpointing guarantees no duplicates or data loss in your Iceberg tables
Patterns
RisingWave supports append-only streaming, upsert patterns with primary keys, CDC replication from PostgreSQL and MySQL, multi-source joins before sinking, and time-partitioned writes. Each pattern is defined entirely in SQL with automatic state management and fault tolerance.
Start building streaming Iceberg pipelines with SQL in minutes.
Start Streaming to Iceberg