A medallion architecture with RisingWave and Iceberg uses three tiers: Bronze (raw data landed in Iceberg via RisingWave sinks), Silver (cleaned and joined data in Iceberg, computed by RisingWave materialized views), and Gold (business-ready aggregations in Iceberg for BI and analytics). Each tier runs continuously, with seconds of latency between source events and Gold layer availability.
What Is a Medallion Architecture?
The medallion architecture (also called the multi-hop architecture) organizes data into progressively refined layers:
- Bronze — Raw, unmodified data from source systems. The landing zone. Immutable once written.
- Silver — Cleaned, validated, and joined data. Conformed to consistent schemas. Still at event or entity grain.
- Gold — Business-level aggregations and metrics. Optimized for consumption by BI tools, dashboards, and ML models.
Originally popularized by Databricks for batch pipelines, the medallion pattern maps naturally to streaming architectures. RisingWave's materialized views act as the transformation engine between tiers, while Iceberg on S3 provides the durable, queryable storage at each layer.
Why Streaming Medallion Beats Batch Medallion
Traditional batch-based medallion pipelines run on schedules: hourly or daily jobs that read Bronze, produce Silver, then produce Gold. This introduces multi-hour latency and complex dependency management (if Bronze is delayed, Silver and Gold cascade).
A streaming medallion pipeline using RisingWave is different:
- Continuous — Transformations run as events arrive, not on a schedule
- Incremental — RisingWave only processes changed data, not full table scans
- Low-latency — Events flow from Bronze to Gold in seconds, not hours
- Self-healing — If a source recovers from an outage, RisingWave automatically catches up
Setting Up Each Layer
Bronze Layer: Raw Data Landing
The Bronze layer ingests raw events directly from Kafka and lands them unmodified into Iceberg. Preserve everything — even malformed records — to enable reprocessing.
-- Bronze: Raw Kafka events → Iceberg
CREATE SOURCE raw_orders_kafka (
order_id VARCHAR,
customer_id VARCHAR,
items JSONB,
raw_total VARCHAR,
shipping_address JSONB,
status VARCHAR,
source_system VARCHAR,
kafka_timestamp TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'raw-orders',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;
CREATE SINK bronze_orders_sink AS
SELECT * FROM raw_orders_kafka
WITH (
connector = 'iceberg',
type = 'append-only',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'bronze',
table.name = 'orders'
);
Use append-only for Bronze — raw events are immutable history. Never overwrite Bronze data.
Silver Layer: Cleaned and Conformed Data
The Silver layer reads from Bronze sources (via Kafka, not Iceberg directly), applies data quality rules, parses nested fields, and joins with reference data:
-- Reference table for customer data (from CDC)
CREATE SOURCE customers_postgres
WITH (
connector = 'postgres-cdc',
hostname = 'postgres-primary',
port = '5432',
username = 'replicator',
password = 'replicator-password',
database.name = 'commerce',
schema.name = 'public',
table.name = 'customers'
);
-- Silver: Cleaned and enriched orders
CREATE MATERIALIZED VIEW silver_orders AS
SELECT
order_id,
customer_id::BIGINT AS customer_id,
-- Parse total safely with fallback
CASE
WHEN raw_total ~ '^\d+(\.\d+)?$' THEN raw_total::NUMERIC
ELSE NULL
END AS order_total,
COALESCE(
(shipping_address->>'country')::VARCHAR,
'UNKNOWN'
) AS shipping_country,
status,
source_system,
kafka_timestamp AS order_time,
c.customer_name,
c.customer_segment,
c.email_domain
FROM raw_orders_kafka o
LEFT JOIN customers_postgres FOR SYSTEM_TIME AS OF PROCTIME() c
ON o.customer_id::BIGINT = c.customer_id
WHERE order_id IS NOT NULL; -- Filter out malformed records
CREATE SINK silver_orders_sink AS
SELECT * FROM silver_orders
WITH (
connector = 'iceberg',
type = 'upsert',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'silver',
table.name = 'orders'
);
Gold Layer: Business Metrics
The Gold layer produces the aggregated, business-ready metrics that analysts and dashboards consume:
CREATE MATERIALIZED VIEW gold_daily_revenue AS
SELECT
shipping_country,
customer_segment,
window_start::DATE AS date,
COUNT(*) AS order_count,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value,
COUNT(DISTINCT customer_id) AS unique_customers
FROM TUMBLE(silver_orders, order_time, INTERVAL '1 DAY')
WHERE order_total IS NOT NULL
GROUP BY shipping_country, customer_segment, window_start;
CREATE SINK gold_revenue_sink AS
SELECT * FROM gold_daily_revenue
WITH (
connector = 'iceberg',
type = 'upsert',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
s3.region = 'us-east-1',
database.name = 'gold',
table.name = 'daily_revenue'
);
Architecture Overview
| Layer | Storage | Transformation | Sink Type | Latency |
| Bronze | Iceberg (bronze DB) | None (raw) | append-only | < 30s |
| Silver | Iceberg (silver DB) | Clean, parse, join | upsert | < 60s |
| Gold | Iceberg (gold DB) | Aggregate, window | upsert | < 90s |
Querying Across Layers
In RisingWave v2.8+, you can query Iceberg tables directly, enabling cross-layer analysis:
-- Query Gold layer Iceberg table from RisingWave
CREATE SOURCE gold_revenue_iceberg
WITH (
connector = 'iceberg',
catalog.type = 'rest',
catalog.uri = 'http://iceberg-catalog:8181',
warehouse.path = 's3://data-lake/warehouse',
database.name = 'gold',
table.name = 'daily_revenue'
);
-- Compare today's live data with historical Gold data
SELECT
live.shipping_country,
live.total_revenue AS todays_revenue,
hist.total_revenue AS last_week_revenue,
ROUND(100.0 * (live.total_revenue - hist.total_revenue) / NULLIF(hist.total_revenue, 0), 1) AS pct_change
FROM gold_daily_revenue live
JOIN gold_revenue_iceberg hist
ON live.shipping_country = hist.shipping_country
AND hist.date = CURRENT_DATE - INTERVAL '7 DAYS'
ORDER BY pct_change DESC;
Data Quality Patterns
Bronze-to-Silver Quality Rules
Apply quality checks in the Silver materialized view using SQL filters and CASE expressions:
- Null handling — Use
COALESCEto replace nulls with defaults - Type coercion — Use regex checks before casting to avoid runtime errors
- Deduplication — Use
MAX(event_time)grouped by entity ID to keep latest state - Range validation — Use
WHERE amount BETWEEN 0 AND 1000000to reject outliers
Silver-to-Gold Quality Rules
- Referential integrity — Only aggregate rows with valid foreign keys (use
INNER JOINinstead ofLEFT JOINin Gold views) - Completeness thresholds — Alert if Gold row counts drop more than 10% from previous period
Medallion vs. Other Data Organization Patterns
| Pattern | Layers | Latency | Complexity | Best For |
| Medallion (Bronze/Silver/Gold) | 3 | Seconds (streaming) | Medium | General-purpose analytics |
| Lambda (batch + stream) | 2 parallel | Minutes (batch) | Very High | Legacy systems |
| Single-layer raw | 1 | Hours (batch) | Low | Simple reporting |
| Domain-oriented (data mesh) | Variable | Variable | High | Large organizations |
| Kappa (stream only) | 1 | Seconds | Low | Event-driven systems |
Operational Considerations
Compaction scheduling — Each Iceberg layer accumulates small files from streaming commits. Schedule compaction jobs for each layer (Bronze: daily, Silver: every 4 hours, Gold: every hour) since Gold receives the most queries.
Snapshot retention — Bronze retains snapshots for 90+ days (for reprocessing). Silver: 30 days. Gold: 7 days (data is regeneratable from Silver).
Monitoring — Track record counts at each layer boundary. Significant drops indicate upstream issues. Track schema drift at Bronze to catch unexpected source changes early.
FAQ
Q: Can I rebuild Silver from Bronze if I need to fix a bug? Yes. This is one of the main benefits of the medallion pattern. Delete the Silver Iceberg table, fix your Silver materialized view, and replay from the Bronze source in Kafka or reprocess from Bronze Iceberg files.
Q: Should Gold tables be served directly to BI tools? Yes. Gold tables in Iceberg are ideal for BI tools. Connect Tableau, Looker, or Power BI via Athena, Trino, or another Iceberg-compatible query engine. The aggregated data is already in the shape analysts need.
Q: How many Gold tables should I create? Create one Gold table per business domain or reporting area. Avoid creating a single monolithic Gold table — it becomes a bottleneck and mixes concerns. Typical setups have 5–20 Gold tables.
Q: Can I use RisingWave for all three layers, or do I need Spark? RisingWave handles all three layers entirely through SQL — no Spark required for the pipeline. You may still want Spark for periodic compaction of Iceberg files, but this is operational maintenance, not part of the pipeline itself.
Q: What if a Silver transformation fails? RisingWave applies backpressure upstream and buffers data until the issue is resolved. Bronze data in Kafka is retained (based on your Kafka retention policy), ensuring no data is lost.
Build Your Streaming Medallion Architecture
The streaming medallion architecture with RisingWave and Iceberg gives you fresh, reliable, multi-layer data — without the operational overhead of scheduled batch jobs or the complexity of Lambda-style dual pipelines.
Get started with the RisingWave documentation and join the RisingWave Slack community to see real-world medallion architecture examples from the community.

