How to Sink Streaming Data to Apache Iceberg with SQL

How to Sink Streaming Data to Apache Iceberg with SQL

Streaming Aggregations in SQL: COUNT, SUM, AVG Over Time Windows

Write streaming data from RisingWave to Apache Iceberg using SQL. Three statements: source, transform, sink.

Step-by-Step

1. Create a Source

CREATE SOURCE events (...) WITH (connector='kafka', topic='events', ...);

2. Transform (Optional)

CREATE MATERIALIZED VIEW clean_events AS
SELECT *, CASE WHEN amount > 0 THEN 'valid' ELSE 'invalid' END as quality
FROM events WHERE user_id IS NOT NULL;

3. Create Iceberg Sink

CREATE SINK events_to_iceberg AS SELECT * FROM clean_events
WITH (
  connector = 'iceberg',
  type = 'append-only',
  catalog.type = 'rest',
  catalog.uri = 'http://iceberg-catalog:8181',
  warehouse.path = 's3://lakehouse/warehouse',
  s3.endpoint = 'https://s3.amazonaws.com',
  s3.region = 'us-east-1',
  database.name = 'analytics',
  table.name = 'events'
);

Done. Data flows continuously from Kafka → RisingWave → Iceberg with automatic compaction.

Sink Options

OptionValuesDefault
typeappend-only, upsertRequired
primary_keyColumn name(s)Required for upsert
catalog.typerest, hive, jdbc, storage, s3_tablesRequired

Frequently Asked Questions

Does RisingWave handle Iceberg compaction?

Yes. RisingWave automatically compacts small Parquet files, unlike Flink and Spark which require separate compaction jobs.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.