Apache Iceberg

Streaming to Apache Iceberg with SQL

Stream data directly into Apache Iceberg tables using SQL. RisingWave transforms Kafka and CDC streams, then sinks to Iceberg with exactly-once semantics and sub-minute latency.

SQL
No Java Required
Write streaming pipelines in SQL, not Kafka Connect configs or Flink Java
Exactly-Once
Delivery Guarantee
Coordinated checkpointing ensures no duplicates or data loss in Iceberg tables
Transform+Sink
Single System
Filter, join, aggregate, and sink in one pipeline — no Spark or Airflow needed
Sub-Second
Processing Latency
Continuous streaming replaces hourly batch loads with always-fresh Iceberg data

Why Streaming

Why is streaming to Apache Iceberg better than batch loading?

Batch ETL jobs load data into Iceberg on hourly or daily schedules, creating stale data and bursty compute costs. Streaming pipelines deliver data continuously with sub-minute freshness, eliminate batch scheduling complexity, and spread compute evenly across time for predictable resource usage.

FactorBatch ETLStreaming (RisingWave)
Data LatencyHoursSeconds to minutes
Data FreshnessStale between runsAlways current
Compute PatternBursty spikesSteady, predictable
ComplexityScheduler + retries + orchestrationSingle SQL pipeline
  • Eliminate Airflow DAGs and cron-based scheduling for Iceberg ingestion
  • Reduce end-to-end data latency from hours to seconds
  • Avoid large Spark cluster spin-ups for periodic batch loads
  • Enable real-time dashboards and analytics on Iceberg data

How It Works

How does RisingWave stream data into Iceberg tables?

RisingWave ingests data from Kafka, CDC connectors, or other streaming sources using SQL. You define transformations as materialized views, then create an Iceberg sink that continuously writes results to your Iceberg catalog. No Java code, no Spark jobs, no orchestrators required.

Kafka to Iceberg

Consume Kafka topics and sink transformed results directly into Iceberg tables with a single SQL statement

CDC to Iceberg

Capture database changes via Debezium or direct CDC connectors and replicate them into Iceberg in real time

Transform + Sink

Apply SQL joins, aggregations, filters, and window functions before data lands in Iceberg

Exactly-Once Delivery

Coordinated checkpointing guarantees no duplicates or data loss in your Iceberg tables

Patterns

What streaming-to-Iceberg patterns does RisingWave support?

RisingWave supports append-only streaming, upsert patterns with primary keys, CDC replication from PostgreSQL and MySQL, multi-source joins before sinking, and time-partitioned writes. Each pattern is defined entirely in SQL with automatic state management and fault tolerance.

  • Append-only streaming for event logs, clickstreams, and sensor data
  • Upsert mode with primary keys for maintaining current-state tables in Iceberg
  • CDC replication from PostgreSQL, MySQL, and MongoDB into Iceberg
  • Multi-source joins that combine Kafka topics and database tables before sinking
  • Time-partitioned writes using Iceberg hidden partitioning for optimized query performance
  • Schema evolution support for adding and modifying columns without pipeline restarts

Frequently Asked Questions

Can RisingWave replace Kafka Connect for Iceberg?
Does RisingWave support Iceberg v2 features?
What catalogs does RisingWave work with?
How does exactly-once delivery to Iceberg work?

Ready to stream to Iceberg?

Start building streaming Iceberg pipelines with SQL in minutes.

Start Streaming to Iceberg
Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.