Migrating from Batch ETL to Streaming ETL: A Step-by-Step Guide

Migrating from Batch ETL to Streaming ETL: A Step-by-Step Guide

How to Build a Real-Time Notification System with Streaming SQL

Migrating from batch ETL (dbt + Airflow) to streaming ETL (RisingWave) can be done incrementally — one pipeline at a time. Here's a step-by-step approach.

Migration Strategy

Phase 1: Identify High-Value Pipelines

Pick pipelines where freshness matters most:

  • Real-time dashboards currently refreshed hourly
  • CDC replication with nightly batch loads
  • Alerting based on stale data

Phase 2: Set Up Streaming Source

-- Replace scheduled full-table extract with CDC
CREATE SOURCE pg_source WITH (connector='postgres-cdc', ...);
CREATE TABLE orders (...) FROM pg_source TABLE 'public.orders';

Phase 3: Convert dbt Models to Materialized Views

-- dbt model (batch, hourly):
-- SELECT region, SUM(amount) FROM orders GROUP BY region

-- RisingWave equivalent (streaming, real-time):
CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT region, SUM(amount) as revenue FROM orders GROUP BY region;

Phase 4: Sink to Same Destination

-- Sink to the same warehouse/lakehouse your dbt models target
CREATE SINK to_iceberg AS SELECT * FROM revenue_by_region
WITH (connector='iceberg', ...);

Phase 5: Decommission Batch Pipeline

Once the streaming pipeline is stable, disable the corresponding Airflow DAG and dbt model.

What Changes, What Stays

AspectBatch (Before)Streaming (After)
OrchestrationAirflow DAGsNot needed (always-on)
Transformationdbt SQLRisingWave SQL (very similar)
FreshnessHourlySub-second
MonitoringDAG success/failurePipeline health views

Frequently Asked Questions

Can I run batch and streaming in parallel during migration?

Yes. Run both pipelines simultaneously and compare results. Once the streaming pipeline matches the batch output, decommission the batch pipeline.

How similar is dbt SQL to RisingWave SQL?

Very similar. Both use standard SQL. The main difference: dbt models are run on schedule; RisingWave materialized views run continuously. Most dbt SQL translates directly to RisingWave with minimal changes.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.