Supply Chain Visibility: From Batch ETL to Real-Time Streaming

Supply chain visibility built on nightly batch ETL shows you where things were yesterday. Streaming SQL shows you where they are right now. This guide explains the architectural differences, when the transition makes sense, and exactly how to replace a batch pipeline with RisingWave — a PostgreSQL-compatible streaming database — without rewriting your entire data stack.

Why Batch ETL Falls Short for Supply Chains

Batch ETL pipelines were the right tool for the job when supply chains moved slowly and data volumes were manageable. A nightly job that extracted orders, inventory, and shipment records, transformed them, and loaded them into a data warehouse served decision-makers who checked reports in the morning.

Modern supply chains are different. Same-day delivery expectations mean operational decisions happen in hours, not days. A disruption at a port, a stockout at a distribution center, or a surge in returns requires a response within minutes. Batch pipelines simply cannot provide the data freshness that these situations demand.

The specific pain points are predictable:

Stale inventory: purchasing sees last night's stock levels, leading to over-ordering or missed replenishment triggers
Delayed exception detection: a shipment missed its transit checkpoint 6 hours ago, but the alert fires after the next batch run
Disconnected operational and analytics views: the operations team's live TMS data and the analytics team's warehouse data are hours out of sync

Real-time streaming addresses all three by making the data pipeline continuous rather than scheduled.

How Streaming Replaces Batch ETL

In a batch ETL architecture, data flows: Source → Extract → Transform → Load → Query. Each arrow represents a scheduled job with latency.

In a streaming architecture with RisingWave, data flows: Source → Stream → Materialized View → Query. The materialized view is the transformation, and it runs continuously. Queries read from views that are always current.

The key shift is from push-on-schedule to push-on-change. RisingWave ingests events from Kafka (for application events) or via CDC connectors (for database changes) and incrementally maintains the views that downstream consumers query.

Step-by-Step Tutorial

Step 1: Set Up the Data Source

Replace batch extracts from source systems with continuous feeds. Here, three source systems — an order management system, a TMS, and a WMS — are connected via Kafka topics and CDC:

-- Orders from OMS via Kafka
CREATE SOURCE oms_orders (
    order_id         VARCHAR,
    customer_id      VARCHAR,
    status           VARCHAR,
    origin_dc        VARCHAR,
    destination      VARCHAR,
    priority         VARCHAR,
    created_at       TIMESTAMPTZ,
    updated_at       TIMESTAMPTZ,
    total_value      DECIMAL(12,2),
    line_count       INTEGER
)
WITH (
    connector    = 'kafka',
    topic        = 'oms-order-events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Shipments from TMS via Kafka
CREATE SOURCE tms_shipments (
    shipment_id      VARCHAR,
    order_id         VARCHAR,
    carrier          VARCHAR,
    tracking_number  VARCHAR,
    status           VARCHAR,
    origin           VARCHAR,
    destination      VARCHAR,
    departed_at      TIMESTAMPTZ,
    expected_arrival TIMESTAMPTZ,
    actual_arrival   TIMESTAMPTZ,
    weight_kg        DOUBLE PRECISION
)
WITH (
    connector    = 'kafka',
    topic        = 'tms-shipment-events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Inventory levels from WMS via CDC
CREATE SOURCE wms_inventory
WITH (
    connector         = 'postgres-cdc',
    hostname          = 'wms-db.internal',
    port              = '5432',
    username          = 'risingwave_reader',
    password          = 'secret',
    database.name     = 'wms',
    schema.name       = 'public',
    table.name        = 'inventory_positions',
    slot.name         = 'rw_inventory_slot'
)
ROW FORMAT DEBEZIUM_JSON;

Step 2: Build the Core Materialized View

Create the unified supply chain visibility view — the primary replacement for the batch-generated supply chain dashboard:

CREATE MATERIALIZED VIEW supply_chain_visibility AS
SELECT
    o.order_id,
    o.customer_id,
    o.priority,
    o.status             AS order_status,
    o.created_at         AS order_created,
    o.total_value,
    s.shipment_id,
    s.carrier,
    s.tracking_number,
    s.status             AS shipment_status,
    s.departed_at,
    s.expected_arrival,
    s.actual_arrival,
    -- Delay flag
    CASE
        WHEN s.actual_arrival IS NOT NULL
             AND s.actual_arrival > s.expected_arrival
        THEN EXTRACT(EPOCH FROM (s.actual_arrival - s.expected_arrival)) / 3600
        WHEN s.actual_arrival IS NULL
             AND NOW() > s.expected_arrival
        THEN EXTRACT(EPOCH FROM (NOW() - s.expected_arrival)) / 3600
        ELSE 0
    END                  AS delay_hours,
    -- Pipeline stage
    CASE
        WHEN o.status = 'PENDING'             THEN 'Order Processing'
        WHEN o.status = 'ALLOCATED'           THEN 'Warehouse Picking'
        WHEN s.status = 'IN_TRANSIT'          THEN 'In Transit'
        WHEN s.status = 'OUT_FOR_DELIVERY'    THEN 'Out for Delivery'
        WHEN s.status = 'DELIVERED'           THEN 'Delivered'
        ELSE 'Exception'
    END                  AS pipeline_stage
FROM (
    SELECT DISTINCT ON (order_id)
        order_id, customer_id, priority, status,
        created_at, total_value, origin_dc
    FROM oms_orders
    ORDER BY order_id, updated_at DESC
) o
LEFT JOIN (
    SELECT DISTINCT ON (order_id)
        order_id, shipment_id, carrier, tracking_number, status,
        departed_at, expected_arrival, actual_arrival
    FROM tms_shipments
    ORDER BY order_id, departed_at DESC NULLS LAST
) s ON o.order_id = s.order_id;

Step 3: Add Aggregations for Executive Dashboards

Compute the KPIs that batch jobs previously generated on a schedule, now maintained continuously:

-- Order fulfillment funnel: live counts by pipeline stage
CREATE MATERIALIZED VIEW fulfillment_funnel AS
SELECT
    pipeline_stage,
    priority,
    COUNT(*)                            AS order_count,
    SUM(total_value)                    AS total_value,
    AVG(delay_hours) FILTER (WHERE delay_hours > 0) AS avg_delay_hours,
    COUNT(*) FILTER (WHERE delay_hours > 2)         AS delayed_order_count
FROM supply_chain_visibility
GROUP BY pipeline_stage, priority;

-- In-transit shipments at risk of missing delivery SLA
CREATE MATERIALIZED VIEW at_risk_shipments AS
SELECT
    order_id,
    shipment_id,
    carrier,
    tracking_number,
    expected_arrival,
    delay_hours,
    priority,
    customer_id,
    total_value
FROM supply_chain_visibility
WHERE shipment_status = 'IN_TRANSIT'
  AND delay_hours > 0
ORDER BY delay_hours DESC;

-- Daily order volume trend (rolling 7-day window)
CREATE MATERIALIZED VIEW daily_order_volume AS
SELECT
    window_start,
    window_end,
    COUNT(*)                    AS orders_created,
    SUM(total_value)            AS gmv,
    COUNT(*) FILTER (WHERE priority = 'EXPRESS') AS express_orders
FROM TUMBLE(oms_orders, created_at, INTERVAL '1 day')
GROUP BY window_start, window_end;

How This Compares to Traditional Approaches

Dimension	Nightly Batch ETL	Micro-batch (every 5 min)	RisingWave Streaming
Data freshness	8–24 hours	5–10 minutes	Sub-second
Exception detection latency	Hours	Minutes	Milliseconds
Infrastructure complexity	Scheduler + ETL tool + DWH	Same + streaming layer	Single streaming DB
Backfill / replay	Supported	Varies	Supported (from Kafka offset)
SQL interface	DWH SQL	Custom APIs often required	PostgreSQL-compatible
Cost at scale	High DWH compute costs	High	Low (incremental only)
Schema change handling	Manual ETL update	Manual	CDC handles automatically

The transition from batch to streaming is not all-or-nothing. A common migration path is to run RisingWave streaming views alongside the existing batch pipeline initially, validate that results match, then sunset the batch jobs one by one.

FAQ

Do I need to replace my data warehouse to use RisingWave?

No. RisingWave can serve as the real-time operational layer while your data warehouse remains for historical analytics. Many teams use RisingWave for sub-minute visibility and write aggregated results to an Iceberg sink that feeds the data warehouse, giving both layers the data they need.

How do I handle late-arriving data in the streaming pipeline?

RisingWave's window functions (TUMBLE, HOP, SESSION) support watermarks for handling late events. For supply chain use cases where a shipment scan might arrive minutes after the physical scan due to connectivity issues, you can configure watermark lag to tolerate expected delays before finalizing window results.

What does the migration from batch to streaming look like operationally?

A practical approach: (1) deploy RisingWave alongside existing systems, (2) create CDC and Kafka sources pointing at the same databases your ETL reads from, (3) build materialized views that replicate the output of your batch transformations, (4) run both systems in parallel for a validation period, (5) switch dashboards and alerts to RisingWave views, (6) decommission batch jobs.

Key Takeaways

Migrating supply chain visibility from batch ETL to RisingWave streaming reduces data freshness from hours to sub-second, enabling proactive exception management instead of reactive reporting.
The unified supply_chain_visibility materialized view joins order, shipment, and inventory data continuously, replacing the batch-generated supply chain dashboard with a view that is always current.
RisingWave's CDC connectors and Kafka sources mean no changes are needed to source systems — you connect to the existing databases and event streams the WMS, OMS, and TMS already produce.

Ready to try this yourself? Get started with RisingWave in minutes. Join our Slack community.