Streaming Materialized Views on Apache Iceberg

Introduction

Apache Iceberg has become the default open table format for data lakehouses. But querying Iceberg tables efficiently still presents a challenge: complex joins and aggregations over large Iceberg datasets are slow, and users want precomputed results they can query instantly. The standard answer is materialized views.

The problem is that most materialized view implementations on Iceberg rely on batch refresh. AWS Glue Data Catalog introduced Iceberg-based materialized views in 2025, but they refresh on a schedule (every 15 minutes to several hours). Databricks materialized views and Cloudera's Hive-based MVs follow the same pattern. Between refreshes, your materialized view is stale. For dashboards, alerting, and operational analytics, "15 minutes old" is not good enough.

RisingWave takes a different approach. Instead of periodically recomputing results from scratch, RisingWave creates streaming materialized views that update incrementally as new data arrives. You define a SQL query once, and RisingWave keeps the result up to date continuously, with sub-second latency. This article shows you how to build streaming materialized views on Apache Iceberg data using RisingWave, with complete SQL examples you can run today.

What Are Streaming Materialized Views on Iceberg?

A streaming materialized view is a precomputed query result that updates incrementally and continuously as the underlying data changes. Unlike a batch-refreshed materialized view that recomputes the entire result set on a schedule, a streaming materialized view processes only the new or changed rows, making updates faster and more efficient.

When applied to Apache Iceberg, this means you can define SQL queries (joins, aggregations, window functions) over your Iceberg tables and get results that stay fresh without scheduling refresh jobs, managing compute clusters, or accepting stale data between refresh cycles.

How Batch Refresh Works (and Why It Falls Short)

In a batch-refreshed materialized view system like AWS Glue MVs or Databricks, the workflow looks like this:

You define a SQL query over one or more Iceberg tables.
The system runs the full query and stores the result.
On a schedule (or manually), the system reruns the query and replaces the stored result.

This approach has three issues:

Staleness: Data is only as fresh as the last refresh. A 30-minute refresh cycle means your dashboard could be 30 minutes behind reality.
Compute cost: Each refresh reprocesses all data (or at best does incremental detection, which still requires scanning metadata). Large tables mean expensive refresh jobs.
Scheduling complexity: You need to manage refresh schedules, handle failures, and coordinate dependencies between views.

How Streaming Materialized Views Work in RisingWave

RisingWave uses incremental view maintenance to keep materialized views up to date. The process works like this:

RisingWave connects to your Iceberg catalog and reads the table data.
You define a materialized view with a SQL query (aggregations, joins, filters, window functions).
As new data arrives in the Iceberg table, RisingWave detects the changes and applies only the delta to the materialized view result.
The materialized view is always queryable and always reflects the latest data.

This approach eliminates refresh scheduling entirely. There is no "refresh interval" to configure because updates happen continuously.

How Does RisingWave Compare to Other Iceberg MV Solutions?

Several platforms now offer materialized views on Iceberg data. Here is how they compare:

Feature	RisingWave	AWS Glue MVs	Databricks MVs	Cloudera/Hive MVs
Refresh model	Streaming (continuous)	Batch (scheduled)	Batch (triggered/scheduled)	Batch (periodic)
Latency	Sub-second to seconds	Minutes to hours	Minutes to hours	Minutes to hours
Incremental updates	Yes (delta processing)	Best-effort incremental	Incremental for supported ops	Full recomputation
Join support	Inner, left, temporal, interval	Limited	Yes	Limited
Window functions	Tumble, hop, session, sliding	No	Yes	No
SQL interface	Standard PostgreSQL-compatible SQL	Spark SQL	Spark SQL / Databricks SQL	HiveQL
Compute model	Always-on streaming engine	Serverless batch jobs	Serverless / cluster	Cluster-based
Upsert support	Yes (with primary keys)	No	Yes	No

The key difference is the refresh model. RisingWave processes changes as they happen. Every other solution recomputes on a schedule, which means there is always a gap between when data changes and when the materialized view reflects those changes.

When Batch Refresh Is Acceptable

Batch-refreshed materialized views work well when:

Query freshness requirements are relaxed (hourly or daily dashboards).
The underlying data changes infrequently.
You want to minimize always-on compute costs for rarely-queried views.

When You Need Streaming Materialized Views

Streaming materialized views are the right choice when:

Dashboards need to reflect changes within seconds.
You are building alerting or anomaly detection on Iceberg data.
Multiple downstream consumers query the same aggregation, and you want to compute it once.
Your Iceberg tables receive continuous writes from CDC or event streaming pipelines.

How to Build Streaming Materialized Views on Iceberg

Let's walk through a complete example. We will connect RisingWave to an Iceberg catalog, create a source, define materialized views with joins and aggregations, and query the results. All SQL examples target RisingWave v2.3.

Step 1: Create an Iceberg Source

First, connect RisingWave to your Iceberg table. This creates a source that reads from an existing Iceberg table in your catalog:

CREATE SOURCE iceberg_orders WITH (
    connector = 'iceberg',
    warehouse.path = 's3://my-lakehouse/warehouse',
    database.name = 'ecommerce',
    table.name = 'orders',
    catalog.type = 'glue',
    catalog.name = 'my_glue_catalog',
    s3.access.key = '${AWS_ACCESS_KEY}',
    s3.secret.key = '${AWS_SECRET_KEY}',
    s3.region = 'us-west-2'
);

You do not need to specify columns. RisingWave reads the schema directly from the Iceberg metadata. You can verify the schema with:

DESCRIBE iceberg_orders;

Expected output:

       Name       |         Type
------------------+-----------------------
 order_id         | bigint
 customer_id      | bigint
 product_id       | bigint
 quantity          | integer
 total_amount     | numeric
 order_status     | varchar
 created_at       | timestamptz

Now create a second source for the customers dimension table:

CREATE SOURCE iceberg_customers WITH (
    connector = 'iceberg',
    warehouse.path = 's3://my-lakehouse/warehouse',
    database.name = 'ecommerce',
    table.name = 'customers',
    catalog.type = 'glue',
    catalog.name = 'my_glue_catalog',
    s3.access.key = '${AWS_ACCESS_KEY}',
    s3.secret.key = '${AWS_SECRET_KEY}',
    s3.region = 'us-west-2'
);

For more details on Iceberg source configuration, see the RisingWave Iceberg source documentation.

Step 2: Create Materialized Views with Aggregations

Now define a materialized view that computes real-time revenue by customer segment. This joins the orders and customers sources and aggregates the results:

CREATE MATERIALIZED VIEW revenue_by_segment AS
SELECT
    c.segment,
    c.region,
    COUNT(*) AS total_orders,
    SUM(o.total_amount) AS total_revenue,
    AVG(o.total_amount) AS avg_order_value
FROM iceberg_orders o
JOIN iceberg_customers c ON o.customer_id = c.customer_id
WHERE o.order_status = 'completed'
GROUP BY c.segment, c.region;

This materialized view updates automatically as new orders or customer records appear in the Iceberg tables. There is no refresh button to click and no schedule to configure.

Step 3: Create a Materialized View with Time Windows

For time-series analysis, you can use window functions to compute rolling aggregations:

CREATE MATERIALIZED VIEW hourly_order_stats AS
SELECT
    window_start,
    window_end,
    COUNT(*) AS order_count,
    SUM(total_amount) AS hourly_revenue,
    AVG(total_amount) AS avg_order_value
FROM TUMBLE(iceberg_orders, created_at, INTERVAL '1 hour')
GROUP BY window_start, window_end;

This creates hourly buckets of order statistics that update continuously as new orders flow into the Iceberg table.

Step 4: Query the Materialized Views

Materialized views in RisingWave are queryable just like regular tables:

SELECT * FROM revenue_by_segment
WHERE segment = 'enterprise'
ORDER BY total_revenue DESC;

Expected output:

  segment   |  region   | total_orders | total_revenue | avg_order_value
------------+-----------+--------------+---------------+-----------------
 enterprise | us-east   |        12847 |    4521890.50 |          351.98
 enterprise | eu-west   |         8234 |    2987345.20 |          362.78
 enterprise | ap-south  |         5612 |    1845670.80 |          328.87

The results reflect the latest state of the Iceberg tables, not a snapshot from the last batch refresh.

Step 5: Sink Results Back to Iceberg (Optional)

If you want downstream tools like Trino or Spark to query the materialized view results, you can sink them back to a separate Iceberg table:

CREATE SINK revenue_segment_sink FROM revenue_by_segment
WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'segment,region',
    warehouse.path = 's3://my-lakehouse/warehouse',
    database.name = 'analytics',
    table.name = 'revenue_by_segment',
    catalog.type = 'glue',
    catalog.name = 'my_glue_catalog',
    s3.access.key = '${AWS_ACCESS_KEY}',
    s3.secret.key = '${AWS_SECRET_KEY}',
    s3.region = 'us-west-2',
    create_table_if_not_exists = 'true'
);

This creates a continuously updated Iceberg table that any query engine can read. The type = 'upsert' setting ensures that updates to existing segments overwrite previous values rather than appending duplicates. For details on Iceberg sink configuration, see the RisingWave Iceberg sink documentation.

What Architecture Patterns Work Best?

There are two common patterns for streaming materialized views on Iceberg:

Pattern 1: Iceberg as Source, RisingWave as Query Layer

In this pattern, data lands in Iceberg tables through existing batch or streaming pipelines. RisingWave reads from those tables and serves precomputed results to dashboards and applications:

Batch/Streaming pipelines → Iceberg tables → RisingWave (MVs) → Dashboards

This works well when you already have Iceberg as your data lake format and want to add real-time query capabilities without changing your ingestion pipeline.

Pattern 2: Kafka to RisingWave to Iceberg

In this pattern, RisingWave sits between your event stream and your lakehouse. It ingests from Kafka, computes materialized views, and sinks results to Iceberg:

Kafka → RisingWave (MVs) → Iceberg → Trino/Spark

This pattern is ideal when you want both real-time query results in RisingWave and historical analytics in your lakehouse. RisingWave serves as both the streaming ETL layer and the real-time query engine.

For a deeper dive into the lakehouse integration architecture, see the RisingWave lakehouse guide.

FAQ

What is a streaming materialized view on Iceberg?

A streaming materialized view on Iceberg is a precomputed query result that updates incrementally and continuously as data in the underlying Iceberg tables changes. Unlike batch-refreshed views that recompute on a schedule, streaming views process only the changed rows, delivering sub-second freshness without manual refresh triggers.

How do RisingWave materialized views differ from AWS Glue materialized views?

RisingWave materialized views update continuously using incremental computation, while AWS Glue materialized views refresh on a batch schedule. RisingWave delivers sub-second freshness with delta processing, whereas AWS Glue MVs have latency measured in minutes to hours depending on the refresh interval.

Can I use streaming materialized views with any Iceberg catalog?

Yes. RisingWave supports multiple Iceberg catalog types, including AWS Glue, REST catalogs, and storage-based catalogs. You configure the catalog type when creating the Iceberg source, and the materialized view works identically regardless of which catalog backs your Iceberg tables.

Do streaming materialized views increase my Iceberg storage costs?

Streaming materialized views in RisingWave store their results in RisingWave's own storage, not in Iceberg. Your Iceberg storage costs remain unchanged. If you optionally sink the materialized view results back to a separate Iceberg table, you will incur additional storage for that output table, but the source tables are unaffected.

Conclusion

Key takeaways:

Batch-refreshed materialized views on Iceberg (AWS Glue, Databricks, Cloudera) introduce staleness measured in minutes to hours. Streaming materialized views in RisingWave eliminate this gap.
RisingWave connects directly to Iceberg catalogs, reads table schemas automatically, and lets you define materialized views with standard SQL.
Joins, aggregations, and window functions all work in streaming materialized views, making them suitable for dashboards, alerting, and operational analytics.
You can optionally sink materialized view results back to Iceberg tables for downstream consumption by Trino, Spark, or DuckDB.
The combination of Iceberg (open storage) and RisingWave (streaming compute) gives you a lakehouse architecture with both historical depth and real-time freshness.

Ready to build streaming materialized views on your Iceberg data? Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.