You can have RisingWave streaming data into Apache Iceberg in under 10 minutes. Spin up RisingWave and an Iceberg REST catalog with Docker Compose, connect a Kafka source, define a materialized view, and create an Iceberg sink—your first lakehouse pipeline is live with five SQL statements.
What You Will Build
This tutorial walks through a complete end-to-end pipeline:
- A Kafka topic producing simulated user events
- A RisingWave source reading those events
- A materialized view computing real-time session metrics
- An Iceberg sink writing results to local MinIO (S3-compatible)
- A direct query against the Iceberg table from RisingWave v2.8
By the end, you will have a working streaming lakehouse you can extend with your own data.
Prerequisites
- Docker and Docker Compose installed
psqlCLI (or any PostgreSQL client)- 10 minutes
Step 1: Start the Stack
Create a docker-compose.yml with RisingWave, Kafka, MinIO, and an Iceberg REST catalog (we use Project Nessie here, which exposes an Iceberg REST-compatible API via Iceberg v2 catalog bridge).
docker compose up -d risingwave kafka minio iceberg-catalog
Once the stack is healthy (check with docker compose ps), connect to RisingWave:
psql -h localhost -p 4566 -d dev -U root
Step 2: Create a Kafka Source
CREATE SOURCE user_events (
user_id VARCHAR,
session_id VARCHAR,
event_type VARCHAR,
page VARCHAR,
ts TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'user.events',
properties.bootstrap.server = 'localhost:9092',
scan.startup.mode = 'earliest'
)
FORMAT PLAIN ENCODE JSON;
Verify the source is reading data:
SELECT * FROM user_events LIMIT 5;
Step 3: Build a Session Metrics View
RisingWave's SESSION window function groups events into sessions separated by a gap of inactivity. This is ideal for user behavior analytics.
CREATE MATERIALIZED VIEW user_session_metrics AS
SELECT
user_id,
session_id,
window_start AS session_start,
window_end AS session_end,
COUNT(*) AS event_count,
COUNT(DISTINCT page) AS unique_pages,
EXTRACT(EPOCH FROM (window_end - window_start)) AS session_duration_sec
FROM SESSION(user_events, ts, INTERVAL '30 MINUTES')
GROUP BY user_id, session_id, window_start, window_end;
Check that the view is producing results:
SELECT * FROM user_session_metrics LIMIT 10;
Step 4: Create the Iceberg Sink
CREATE SINK user_sessions_to_iceberg AS
SELECT * FROM user_session_metrics
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'rest',
catalog.uri = 'http://localhost:8181',
warehouse.path = 's3://lakehouse/warehouse',
s3.region = 'us-east-1',
s3.endpoint = 'http://localhost:9000',
s3.access.key = 'minioadmin',
s3.secret.key = 'minioadmin',
database.name = 'analytics',
table.name = 'user_sessions'
);
RisingWave automatically creates the Iceberg table if it does not exist. Wait 60 seconds for the first checkpoint to commit.
Step 5: Query the Iceberg Table
In RisingWave v2.8, you can query Iceberg tables directly:
SELECT
DATE_TRUNC('hour', session_start) AS hour,
COUNT(*) AS session_count,
AVG(session_duration_sec) AS avg_duration,
AVG(unique_pages) AS avg_pages_per_session
FROM iceberg_scan(
's3://lakehouse/warehouse',
'analytics',
'user_sessions'
)
GROUP BY DATE_TRUNC('hour', session_start)
ORDER BY hour DESC
LIMIT 24;
You now have a live hourly report backed by Iceberg storage.
Understanding What Just Happened
| Step | Component | Role |
CREATE SOURCE | RisingWave | Reads Kafka topic in real time |
CREATE MATERIALIZED VIEW | RisingWave | Continuously computes session aggregations |
CREATE SINK | RisingWave | Commits results to Iceberg every checkpoint |
| Iceberg REST Catalog | Nessie/Polaris | Tracks table metadata and snapshots |
| MinIO / S3 | Object Storage | Stores Parquet data files durably |
iceberg_scan() | RisingWave v2.8 | Queries Iceberg tables directly from SQL |
Extending the Tutorial
Once the basic pipeline works, you can:
Add schema evolution: New columns in user_events are propagated through RisingWave automatically. Add matching columns to your materialized view and Iceberg handles the schema migration.
Enable upsert: Change type = 'append-only' to type = 'upsert' and add a primary key to the sink to keep a current-state table rather than an event log.
Connect a BI tool: Point Trino, DuckDB, or Apache Spark at your MinIO bucket with the Nessie catalog URI to query the same Iceberg data from your preferred analytics tool.
Add more sources: Bring in a Postgres CDC source alongside your Kafka source. Join them in a materialized view for enriched analytics.
Common Mistakes to Avoid
Wrong catalog URI: The most common issue is a network misconfiguration between RisingWave and the catalog service. Confirm connectivity with curl http://localhost:8181/v1/config from inside the RisingWave container.
Missing S3 credentials: For production S3, always pass s3.region, s3.access.key, and s3.secret.key (or use IAM roles via instance profile). Omitting these causes silent authentication failures.
Small file accumulation: Each checkpoint creates new Parquet files. Schedule Iceberg's rewriteDataFiles procedure daily to compact files for efficient querying.
FAQ
Q: Do I need to create the Iceberg table manually before creating the sink?
A: No. RisingWave automatically creates the Iceberg table based on the schema of your SELECT statement when the sink is created.
Q: What Iceberg catalog backends does RisingWave support? A: RisingWave supports the Iceberg REST catalog spec (compatible with Nessie, Polaris, Tabular, and AWS Glue with the REST adapter), as well as direct Hive Metastore via JDBC.
Q: How do I monitor sink lag?
A: Use SHOW SINK BACKPRESSURE in RisingWave to check if the sink is keeping up with the materialized view. You can also monitor Iceberg snapshot creation frequency in the catalog.
Q: Can I write to multiple Iceberg tables from one RisingWave deployment?
A: Yes. Create multiple CREATE SINK statements, each pointing to a different table.name. RisingWave manages each sink independently.
Q: What is the minimum Iceberg version supported? A: RisingWave supports Iceberg v2 format, which includes row-level deletes required for upsert semantics. Iceberg v1 tables are supported for append-only sinks.
Get Started
Follow the complete RisingWave documentation for production deployment guides, configuration references, and connector details. Ask questions and share your pipeline on the RisingWave Slack community.

