Streaming Lakehouse Architecture: Real-Time + Historical Analytics
A streaming lakehouse combines real-time serving (sub-second queries) with historical analytics (scan petabytes) in a single architecture. The pattern: a streaming database (RisingWave) serves real-time queries via materialized views AND sinks data to Apache Iceberg for long-term analytical queries.
Architecture
Sources (Kafka, CDC) ──→ RisingWave
│
┌─────────┴─────────┐
↓ ↓
Materialized Views Iceberg Sink
(real-time serving) (historical storage)
│ │
↓ ↓
Applications Trino / Spark
(sub-100ms) (analytical queries)
Why Both?
| Need | Streaming MVs | Iceberg |
| Latest 5-min metrics | ✅ Sub-100ms | ❌ Delayed |
| Last 30 days trend | ⚠️ Expensive to maintain | ✅ Efficient |
| Ad-hoc exploration | ⚠️ Pre-defined queries only | ✅ Flexible |
| ML training data | ❌ Wrong tool | ✅ Perfect |
Streaming MVs are optimal for known, high-frequency queries with strict freshness requirements. Iceberg is optimal for flexible, historical analysis over large datasets.
Implementation
-- Real-time serving layer
CREATE MATERIALIZED VIEW live_metrics AS
SELECT region, COUNT(*) as orders, SUM(amount) as revenue
FROM orders_stream WHERE order_time > NOW()-INTERVAL '5 minutes'
GROUP BY region;
-- Historical analytics layer (same data, different destination)
CREATE SINK orders_to_iceberg AS SELECT * FROM orders_stream
WITH (connector='iceberg', type='append-only', ...);
Both views are fed from the same streaming source. One serves real-time; the other stores history.
Frequently Asked Questions
Do I need both streaming MVs and Iceberg?
Not always. If you only need real-time metrics, MVs alone are sufficient. If you only need historical analytics, Iceberg alone works. The streaming lakehouse pattern is for workloads requiring both — which is increasingly common.
How much does a streaming lakehouse cost?
Compute: RisingWave cluster ($100-500/month for moderate workloads). Storage: S3 at $0.023/GB/month. Query engines: Trino (self-hosted) or DuckDB (free). Total can be 5-10x cheaper than a traditional data warehouse.

