Building a Medallion Architecture for E-Commerce with RisingWave

The Medallion architecture — Bronze for raw ingest, Silver for clean wide tables, Gold for serving-ready aggregates — is the most widely adopted data lakehouse pattern in e-commerce. With RisingWave, a PostgreSQL-compatible streaming database, you can implement all three layers in streaming SQL, keeping every layer continuously up to date as orders and events flow through Kafka.

The Problem: Static Medallion Architectures Miss the Real-Time Window

The classic Medallion architecture was designed for batch and micro-batch processing. Bronze is an S3 data lake. Silver is a dbt transformation run every 30 minutes. Gold is a BI-ready table refreshed nightly. This works for historical reporting but breaks down for operational use cases:

Merchant dashboards need GMV and order counts that are current to the minute, not the last batch run.
Operations teams need to detect inventory stockouts within seconds of them occurring, not hours.
Finance teams need real-time payment method breakdowns during high-volume events like 11/11 or Black Friday.

SHOPLINE — Asia's leading commerce enablement platform — faced exactly this problem. Their original Lambda architecture (dual batch+streaming pipelines) created consistency windows and duplicated business logic. After adopting RisingWave, they achieved a 76.7% reduction in API response times by making all three Medallion layers streaming.

"By adopting RisingWave, we moved to a cost-effective Kappa architecture. It provides the low-latency real-time analytics we need and delivers a much smoother experience for our merchants." — Henry Chi, Manager, Data Team, SHOPLINE

How Streaming SQL Solves This

RisingWave implements the Medallion layers as materialized views that are incrementally maintained. When an order event arrives in Kafka, RisingWave propagates the update through Bronze → Silver → Gold in a single processing pass. The Gold layer is always consistent with the Bronze layer — there is no batch lag, no reconciliation window.

The practical implication: your serving layer (application databases, BI tools, alerting systems) always reads from pre-computed, continuously updated results.

Building the Streaming Medallion Architecture

Step 1: Bronze Layer — Data Source Setup

The Bronze layer connects directly to Kafka. Raw events land here without transformation — the goal is lossless ingest:

-- Raw order events from all channels
CREATE SOURCE orders (
    order_id        VARCHAR,
    merchant_id     VARCHAR,
    customer_id     VARCHAR,
    channel         VARCHAR,        -- 'web', 'mobile', 'pos', 'marketplace'
    payment_method  VARCHAR,        -- 'card', 'alipay', 'paypal', 'cod'
    subtotal        NUMERIC,
    discount        NUMERIC,
    total           NUMERIC,
    status          VARCHAR,
    created_at      TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'ecommerce.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

-- Product catalog updates
CREATE SOURCE product_updates (
    sku_id          VARCHAR,
    merchant_id     VARCHAR,
    product_name    VARCHAR,
    category        VARCHAR,
    list_price      NUMERIC,
    updated_at      TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'ecommerce.products',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

For object storage durability, Kafka's own retention policy (or a Kafka-to-S3 connector like Kafka Connect S3 Sink) handles the Bronze archival layer independently.

Step 2: Silver Layer — Core Materialized View

The Silver layer cleans, validates, and enriches raw events. The key transformation here is joining orders to merchant and product metadata to produce wide, analysis-ready rows:

CREATE MATERIALIZED VIEW silver_orders AS
SELECT
    o.order_id,
    o.merchant_id,
    o.customer_id,
    o.channel,
    o.payment_method,
    o.subtotal,
    o.discount,
    o.total,
    o.status,
    o.created_at,
    -- Enrichment from merchant dimension
    m.store_name,
    m.region,
    m.tier,
    m.country_code,
    -- Computed fields
    ROUND(o.discount / NULLIF(o.subtotal, 0) * 100, 2)  AS discount_pct,
    CASE
        WHEN o.total >= 500  THEN 'high_value'
        WHEN o.total >= 100  THEN 'mid_value'
        ELSE 'low_value'
    END                                                   AS order_tier,
    DATE_TRUNC('hour', o.created_at)                     AS hour_bucket
FROM orders o
JOIN merchants m ON o.merchant_id = m.merchant_id
WHERE o.status NOT IN ('test', 'system_generated');

This is a streaming join — RisingWave maintains the join result incrementally as new orders arrive and as the merchants table updates.

Step 3: Gold Layer — Aggregations

The Gold layer pre-computes the aggregates that downstream consumers actually need. Each view is a separate serving artifact for a specific use case:

-- GMV and order metrics by merchant, channel, and hour
CREATE MATERIALIZED VIEW gold_merchant_hourly AS
SELECT
    merchant_id,
    store_name,
    region,
    channel,
    WINDOW_START                                AS hour_start,
    COUNT(*)                                    AS orders,
    SUM(total)                                  AS gmv,
    ROUND(AVG(total), 2)                        AS aov,
    COUNT(DISTINCT customer_id)                 AS unique_buyers,
    SUM(discount)                               AS discount_amount,
    COUNT(CASE WHEN status = 'refunded'
               THEN 1 END)                     AS refunds
FROM TUMBLE(silver_orders, created_at, INTERVAL '1 HOUR')
GROUP BY merchant_id, store_name, region, channel, WINDOW_START;

-- Payment method breakdown (near-real-time)
CREATE MATERIALIZED VIEW gold_payment_breakdown AS
SELECT
    merchant_id,
    payment_method,
    WINDOW_START                                AS hour_start,
    COUNT(*)                                    AS transaction_count,
    SUM(total)                                  AS payment_volume,
    ROUND(AVG(total), 2)                        AS avg_transaction_value
FROM TUMBLE(silver_orders, created_at, INTERVAL '1 HOUR')
WHERE status NOT IN ('cancelled', 'failed')
GROUP BY merchant_id, payment_method, WINDOW_START;

-- Platform-level daily summary
CREATE MATERIALIZED VIEW gold_platform_daily AS
SELECT
    DATE_TRUNC('day', hour_start)               AS day,
    SUM(orders)                                 AS total_orders,
    SUM(gmv)                                    AS total_gmv,
    COUNT(DISTINCT merchant_id)                 AS active_merchants,
    ROUND(AVG(aov), 2)                          AS platform_aov
FROM gold_merchant_hourly
GROUP BY DATE_TRUNC('day', hour_start);

Step 4: Serving Layer — Downstream Sinks

The Gold layer feeds into application databases, the data warehouse, and alerting systems via sinks:

-- Merchant dashboard application DB
CREATE SINK sink_merchant_dashboard
FROM gold_merchant_hourly
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://app-db:5432/dashboard',
    table.name = 'merchant_hourly_metrics',
    type = 'upsert',
    primary_key = 'merchant_id,channel,hour_start'
);

-- Data warehouse for historical analysis
CREATE SINK sink_to_warehouse
FROM gold_platform_daily
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://warehouse:5432/reporting',
    table.name = 'platform_daily_summary',
    type = 'upsert',
    primary_key = 'day'
);

Architecture Comparison

Layer	Batch Medallion (dbt)	Streaming Medallion (RisingWave)
Bronze	S3 / raw Kafka	Kafka source (same)
Silver	dbt model, runs every 30 min	Streaming materialized view, always current
Gold	dbt model + BI tool aggregation	Streaming materialized view, pre-aggregated
Serving	Query warehouse at query time	Read pre-computed results from app DB
Latency	Minutes to hours	Sub-second to seconds
Consistency	Stale between runs	Continuously consistent
Backfill	Re-run dbt job	Replay Kafka offset
Language	SQL (dbt)	SQL (PostgreSQL-compatible)

FAQ

Q: Can I use RisingWave alongside an existing dbt + data warehouse setup? Yes — and this is the recommended migration path. Keep dbt + warehouse for historical analysis and ad hoc queries. Add RisingWave's streaming Medallion layers for real-time serving. The Gold-layer sinks feed both your application database (for dashboards) and your warehouse (for history).

Q: How do I handle schema changes in Silver layer views? RisingWave supports ALTER MATERIALIZED VIEW for additive changes (new columns). For breaking changes, you drop and recreate the view — RisingWave reprocesses from the Kafka offset automatically. Because Silver and Gold views are defined in SQL, the change is a one-line edit rather than a pipeline deployment.

Q: What is the operational overhead of running three layers of materialized views? Lower than running two separate pipelines. RisingWave handles checkpointing, fault recovery, and backpressure internally. The Silver and Gold views share the same Kafka consumption — data is not read multiple times from the broker.

Key Takeaways

The Medallion architecture works best when all three layers are streaming — batch Silver and Gold layers introduce the same lag that dashboards are trying to eliminate.
RisingWave implements each Medallion layer as a materialized view: Bronze (Kafka source), Silver (streaming join + enrichment), Gold (windowed aggregation).
SHOPLINE's 76.7% API latency reduction came from making their Gold layer streaming — dashboards read pre-computed aggregates instead of querying the warehouse at request time.
All three layers are defined in standard PostgreSQL-compatible SQL — no Flink, no Spark, no separate streaming language.
Sinks at the Gold layer feed multiple downstream consumers (application DB, warehouse, alerting) from a single streaming source.

Get started | Slack