In RisingWave v2.8+, you can query Apache Iceberg tables directly using streaming SQL by registering them as sources. This enables lakehouse-style queries — joining Iceberg tables with live Kafka streams, reading historical snapshots, and building materialized views on top of Iceberg data — all in standard SQL without moving data out of the lake.
Why Query Iceberg with Streaming SQL?
Most Iceberg query engines (Athena, Trino, Spark SQL) are designed for batch analytics: you submit a query, it scans files, returns results. This is excellent for ad-hoc analysis and large batch jobs, but it has limitations for operational use cases:
- No continuous queries — You can't say "alert me when this metric crosses a threshold"
- No streaming joins — You can't join live Kafka data with historical Iceberg tables
- No incremental computation — Each query rescans the full dataset
RisingWave brings streaming semantics to Iceberg queries. You can register an Iceberg table as a source, build a materialized view on it, and RisingWave will incrementally update query results as the Iceberg table receives new snapshots.
Registering Iceberg Tables as Sources
-- Register an existing Iceberg table as a RisingWave source
CREATE SOURCE orders_iceberg
WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'commerce',
table.name = 'orders'
);
Once registered, orders_iceberg behaves like any other RisingWave source. You can query it directly, join it with other sources, or build materialized views on top of it.
-- Simple query of Iceberg table
SELECT
shipping_country,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders_iceberg
WHERE created_at >= '2026-01-01'
GROUP BY shipping_country
ORDER BY revenue DESC
LIMIT 20;
Time Travel Queries
Apache Iceberg's snapshot-based architecture enables time travel. Query the state of a table as it was at a specific snapshot or timestamp:
-- In Spark or Trino, time travel with Iceberg:
-- SELECT * FROM orders FOR SYSTEM_VERSION AS OF 123456789;
-- SELECT * FROM orders FOR TIMESTAMP AS OF '2026-01-15 00:00:00';
While RisingWave reads from the current snapshot by default, you can use your catalog's REST API to reference historical snapshots for point-in-time analysis. For ongoing historical comparisons, use RisingWave's Iceberg sink to build a continuously updated materialized view and query historical Iceberg data separately.
Joining Live Streams with Historical Iceberg Data
This is one of the most powerful patterns enabled by RisingWave's Iceberg source support. You can enrich live streaming events with historical context stored in Iceberg:
-- Live source: current Kafka events
CREATE SOURCE live_events (
user_id BIGINT,
event_type VARCHAR,
page_url VARCHAR,
event_time TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'live-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;
-- Historical source: user profiles stored in Iceberg
CREATE SOURCE user_profiles_iceberg
WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'users',
table.name = 'user_profiles'
);
-- Join live events with historical user profiles
CREATE MATERIALIZED VIEW enriched_live_events AS
SELECT
e.user_id,
e.event_type,
e.page_url,
e.event_time,
p.user_segment,
p.account_age_days,
p.lifetime_value,
p.preferred_language
FROM live_events e
LEFT JOIN user_profiles_iceberg p ON e.user_id = p.user_id;
This materialized view is updated continuously as new events arrive in Kafka, using the user profile data from Iceberg for enrichment.
Incremental Reads from Iceberg
RisingWave can read Iceberg tables incrementally by monitoring new snapshots. When a new snapshot is committed to the Iceberg table (by RisingWave's own sink or by another engine like Spark), RisingWave reads only the new files — not the entire table.
This makes RisingWave an efficient consumer of Iceberg tables written by batch systems:
-- Iceberg table written by nightly Spark jobs
CREATE SOURCE batch_processed_features
WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://ml-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'ml_features',
table.name = 'user_features'
);
-- Enriched real-time scoring input
CREATE MATERIALIZED VIEW scoring_input AS
SELECT
e.user_id,
e.session_id,
e.event_type,
f.feature_vector_json,
f.risk_score,
f.last_model_update
FROM live_events e
JOIN batch_processed_features f ON e.user_id = f.user_id;
Query Patterns for Iceberg Analytics
Pattern 1: Point-in-Time Snapshot
Use SQL aggregations to answer "what was the state at a specific time?" questions:
SELECT
shipping_country,
COUNT(*) AS orders,
SUM(total_amount) AS revenue
FROM orders_iceberg
WHERE created_at BETWEEN '2026-03-01' AND '2026-03-31'
GROUP BY shipping_country;
Pattern 2: Incremental Aggregation with Tumble Windows
Build time-windowed aggregations on Iceberg sources:
CREATE MATERIALIZED VIEW iceberg_daily_revenue AS
SELECT
shipping_country,
window_start::DATE AS date,
COUNT(*) AS orders,
SUM(total_amount) AS revenue
FROM TUMBLE(orders_iceberg, created_at, INTERVAL '1 DAY')
GROUP BY shipping_country, window_start;
Pattern 3: Cross-Layer Lakehouse Analytics
Compare real-time streaming data with historical Iceberg data:
CREATE SOURCE historical_revenue_iceberg
WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'analytics',
table.name = 'daily_revenue'
);
-- Compare current hour to same hour last week
SELECT
live.shipping_country,
live.total_revenue AS current_revenue,
hist.total_revenue AS last_week_revenue,
ROUND(100.0 * (live.total_revenue - hist.total_revenue) / NULLIF(hist.total_revenue, 0), 1) AS wow_growth_pct
FROM hourly_analytics live
JOIN historical_revenue_iceberg hist
ON live.shipping_country = hist.shipping_country
AND hist.date = CURRENT_DATE - INTERVAL '7 DAYS'
ORDER BY wow_growth_pct DESC;
Query Engine Comparison for Iceberg
| Query Engine | Streaming Joins | Incremental MV | Sub-second Latency | SQL Standard | Open Source |
| RisingWave | Yes | Yes | Yes (in-memory) | Yes | Yes |
| Apache Flink | Yes | Yes | Yes | Limited | Yes |
| Trino | No | No | No | Yes | Yes |
| Amazon Athena | No | No | No | Yes | No |
| Apache Spark | No | No | No | Yes | Yes |
| DuckDB | No | No | Near | Yes | Yes |
RisingWave is uniquely positioned as both a streaming SQL engine and an Iceberg query engine — supporting continuous materialized views on top of Iceberg sources.
Performance Tips for Iceberg Queries in RisingWave
Predicate pushdown — Always include partition key filters in your WHERE clauses. RisingWave pushes predicates down to Iceberg's manifest scan, avoiding reading irrelevant files.
Column projection — Only SELECT the columns you need. RisingWave projects columns at the Parquet file level, reducing I/O.
Limit initial scan scope — When building a materialized view on a large Iceberg table, add a date filter to avoid scanning years of history on startup.
Snapshot polling interval — RisingWave checks for new Iceberg snapshots periodically. For near-real-time freshness, configure a short polling interval.
Common Use Cases
| Use Case | Iceberg Source | Live Source | Pattern |
| Feature enrichment for ML | User features (Iceberg) | Live events (Kafka) | Join |
| Anomaly detection | Historical baselines (Iceberg) | Live metrics (Kafka) | Join + filter |
| Trend comparison | Historical data (Iceberg) | Current data (MV) | Join |
| Backfill processing | Historical raw events (Iceberg) | None | Source scan |
| Cross-system reporting | Multiple Iceberg tables | Kafka streams | Multi-join |
FAQ
Q: Can RisingWave write to and read from the same Iceberg table simultaneously? Yes. You can have a sink writing to an Iceberg table while a separate source reads from it. RisingWave reads from committed snapshots, so there's no interference between readers and writers.
Q: How does RisingWave detect new snapshots in an Iceberg table written by Spark? RisingWave polls the Iceberg catalog for new snapshots. When Spark commits a new snapshot, RisingWave detects it within the polling interval and reads the new data files.
Q: Are there any Iceberg features not supported by RisingWave's source connector? RisingWave reads standard Parquet-format Iceberg tables. Row-level delete files (positional deletes) are applied during read. ORC and Avro data files may have limited support depending on your RisingWave version.
Q: Can I use schema evolution with RisingWave's Iceberg source? Yes. When you evolve the Iceberg table schema (add columns), RisingWave will see the new columns on its next scan. Your downstream materialized views will need to be updated to reference the new columns.
Q: What catalogs does RisingWave's Iceberg source support? The Iceberg source connector supports REST catalogs (the recommended approach), AWS Glue, and Hive Metastore — the same catalogs supported by the Iceberg sink.
Start Querying Your Iceberg Lake
RisingWave's Iceberg source support transforms your data lake from a passive storage layer into an active component of your streaming analytics architecture. Historical data and live streams can be queried together, enriched together, and analyzed together.
Get started with the RisingWave documentation and explore the Iceberg source connector. Join the community on Slack to share what you're building.

