Supply chain visibility built on nightly batch ETL shows you where things were yesterday. Streaming SQL shows you where they are right now. This guide explains the architectural differences, when the transition makes sense, and exactly how to replace a batch pipeline with RisingWave — a PostgreSQL-compatible streaming database — without rewriting your entire data stack.
Why Batch ETL Falls Short for Supply Chains
Batch ETL pipelines were the right tool for the job when supply chains moved slowly and data volumes were manageable. A nightly job that extracted orders, inventory, and shipment records, transformed them, and loaded them into a data warehouse served decision-makers who checked reports in the morning.
Modern supply chains are different. Same-day delivery expectations mean operational decisions happen in hours, not days. A disruption at a port, a stockout at a distribution center, or a surge in returns requires a response within minutes. Batch pipelines simply cannot provide the data freshness that these situations demand.
The specific pain points are predictable:
- Stale inventory: purchasing sees last night's stock levels, leading to over-ordering or missed replenishment triggers
- Delayed exception detection: a shipment missed its transit checkpoint 6 hours ago, but the alert fires after the next batch run
- Disconnected operational and analytics views: the operations team's live TMS data and the analytics team's warehouse data are hours out of sync
Real-time streaming addresses all three by making the data pipeline continuous rather than scheduled.
How Streaming Replaces Batch ETL
In a batch ETL architecture, data flows: Source → Extract → Transform → Load → Query. Each arrow represents a scheduled job with latency.
In a streaming architecture with RisingWave, data flows: Source → Stream → Materialized View → Query. The materialized view is the transformation, and it runs continuously. Queries read from views that are always current.
The key shift is from push-on-schedule to push-on-change. RisingWave ingests events from Kafka (for application events) or via CDC connectors (for database changes) and incrementally maintains the views that downstream consumers query.
Step-by-Step Tutorial
Step 1: Set Up the Data Source
Replace batch extracts from source systems with continuous feeds. Here, three source systems — an order management system, a TMS, and a WMS — are connected via Kafka topics and CDC:
-- Orders from OMS via Kafka
CREATE SOURCE oms_orders (
order_id VARCHAR,
customer_id VARCHAR,
status VARCHAR,
origin_dc VARCHAR,
destination VARCHAR,
priority VARCHAR,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ,
total_value DECIMAL(12,2),
line_count INTEGER
)
WITH (
connector = 'kafka',
topic = 'oms-order-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
-- Shipments from TMS via Kafka
CREATE SOURCE tms_shipments (
shipment_id VARCHAR,
order_id VARCHAR,
carrier VARCHAR,
tracking_number VARCHAR,
status VARCHAR,
origin VARCHAR,
destination VARCHAR,
departed_at TIMESTAMPTZ,
expected_arrival TIMESTAMPTZ,
actual_arrival TIMESTAMPTZ,
weight_kg DOUBLE PRECISION
)
WITH (
connector = 'kafka',
topic = 'tms-shipment-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
-- Inventory levels from WMS via CDC
CREATE SOURCE wms_inventory
WITH (
connector = 'postgres-cdc',
hostname = 'wms-db.internal',
port = '5432',
username = 'risingwave_reader',
password = 'secret',
database.name = 'wms',
schema.name = 'public',
table.name = 'inventory_positions',
slot.name = 'rw_inventory_slot'
)
ROW FORMAT DEBEZIUM_JSON;
Step 2: Build the Core Materialized View
Create the unified supply chain visibility view — the primary replacement for the batch-generated supply chain dashboard:
CREATE MATERIALIZED VIEW supply_chain_visibility AS
SELECT
o.order_id,
o.customer_id,
o.priority,
o.status AS order_status,
o.created_at AS order_created,
o.total_value,
s.shipment_id,
s.carrier,
s.tracking_number,
s.status AS shipment_status,
s.departed_at,
s.expected_arrival,
s.actual_arrival,
-- Delay flag
CASE
WHEN s.actual_arrival IS NOT NULL
AND s.actual_arrival > s.expected_arrival
THEN EXTRACT(EPOCH FROM (s.actual_arrival - s.expected_arrival)) / 3600
WHEN s.actual_arrival IS NULL
AND NOW() > s.expected_arrival
THEN EXTRACT(EPOCH FROM (NOW() - s.expected_arrival)) / 3600
ELSE 0
END AS delay_hours,
-- Pipeline stage
CASE
WHEN o.status = 'PENDING' THEN 'Order Processing'
WHEN o.status = 'ALLOCATED' THEN 'Warehouse Picking'
WHEN s.status = 'IN_TRANSIT' THEN 'In Transit'
WHEN s.status = 'OUT_FOR_DELIVERY' THEN 'Out for Delivery'
WHEN s.status = 'DELIVERED' THEN 'Delivered'
ELSE 'Exception'
END AS pipeline_stage
FROM (
SELECT DISTINCT ON (order_id)
order_id, customer_id, priority, status,
created_at, total_value, origin_dc
FROM oms_orders
ORDER BY order_id, updated_at DESC
) o
LEFT JOIN (
SELECT DISTINCT ON (order_id)
order_id, shipment_id, carrier, tracking_number, status,
departed_at, expected_arrival, actual_arrival
FROM tms_shipments
ORDER BY order_id, departed_at DESC NULLS LAST
) s ON o.order_id = s.order_id;
Step 3: Add Aggregations for Executive Dashboards
Compute the KPIs that batch jobs previously generated on a schedule, now maintained continuously:
-- Order fulfillment funnel: live counts by pipeline stage
CREATE MATERIALIZED VIEW fulfillment_funnel AS
SELECT
pipeline_stage,
priority,
COUNT(*) AS order_count,
SUM(total_value) AS total_value,
AVG(delay_hours) FILTER (WHERE delay_hours > 0) AS avg_delay_hours,
COUNT(*) FILTER (WHERE delay_hours > 2) AS delayed_order_count
FROM supply_chain_visibility
GROUP BY pipeline_stage, priority;
-- In-transit shipments at risk of missing delivery SLA
CREATE MATERIALIZED VIEW at_risk_shipments AS
SELECT
order_id,
shipment_id,
carrier,
tracking_number,
expected_arrival,
delay_hours,
priority,
customer_id,
total_value
FROM supply_chain_visibility
WHERE shipment_status = 'IN_TRANSIT'
AND delay_hours > 0
ORDER BY delay_hours DESC;
-- Daily order volume trend (rolling 7-day window)
CREATE MATERIALIZED VIEW daily_order_volume AS
SELECT
window_start,
window_end,
COUNT(*) AS orders_created,
SUM(total_value) AS gmv,
COUNT(*) FILTER (WHERE priority = 'EXPRESS') AS express_orders
FROM TUMBLE(oms_orders, created_at, INTERVAL '1 day')
GROUP BY window_start, window_end;
How This Compares to Traditional Approaches
| Dimension | Nightly Batch ETL | Micro-batch (every 5 min) | RisingWave Streaming |
| Data freshness | 8–24 hours | 5–10 minutes | Sub-second |
| Exception detection latency | Hours | Minutes | Milliseconds |
| Infrastructure complexity | Scheduler + ETL tool + DWH | Same + streaming layer | Single streaming DB |
| Backfill / replay | Supported | Varies | Supported (from Kafka offset) |
| SQL interface | DWH SQL | Custom APIs often required | PostgreSQL-compatible |
| Cost at scale | High DWH compute costs | High | Low (incremental only) |
| Schema change handling | Manual ETL update | Manual | CDC handles automatically |
The transition from batch to streaming is not all-or-nothing. A common migration path is to run RisingWave streaming views alongside the existing batch pipeline initially, validate that results match, then sunset the batch jobs one by one.
FAQ
Do I need to replace my data warehouse to use RisingWave?
No. RisingWave can serve as the real-time operational layer while your data warehouse remains for historical analytics. Many teams use RisingWave for sub-minute visibility and write aggregated results to an Iceberg sink that feeds the data warehouse, giving both layers the data they need.
How do I handle late-arriving data in the streaming pipeline?
RisingWave's window functions (TUMBLE, HOP, SESSION) support watermarks for handling late events. For supply chain use cases where a shipment scan might arrive minutes after the physical scan due to connectivity issues, you can configure watermark lag to tolerate expected delays before finalizing window results.
What does the migration from batch to streaming look like operationally?
A practical approach: (1) deploy RisingWave alongside existing systems, (2) create CDC and Kafka sources pointing at the same databases your ETL reads from, (3) build materialized views that replicate the output of your batch transformations, (4) run both systems in parallel for a validation period, (5) switch dashboards and alerts to RisingWave views, (6) decommission batch jobs.
Key Takeaways
- Migrating supply chain visibility from batch ETL to RisingWave streaming reduces data freshness from hours to sub-second, enabling proactive exception management instead of reactive reporting.
- The unified
supply_chain_visibilitymaterialized view joins order, shipment, and inventory data continuously, replacing the batch-generated supply chain dashboard with a view that is always current. - RisingWave's CDC connectors and Kafka sources mean no changes are needed to source systems — you connect to the existing databases and event streams the WMS, OMS, and TMS already produce.
Ready to try this yourself? Get started with RisingWave in minutes. Join our Slack community.

