Streaming ETL
Replace batch ETL with continuous, real-time data pipelines. RisingWave transforms streaming data using SQL — ingesting from Kafka, databases, and APIs, then delivering to data lakes, warehouses, and applications with exactly-once semantics.
Why Streaming
Batch ETL introduces hours of latency between data creation and data availability. Modern applications demand real-time feature serving, instant fraud detection, and live operational dashboards. Batch pipelines also create brittle dependency chains that are expensive to monitor, debug, and maintain at scale.
| Factor | Batch ETL | Streaming ETL (RisingWave) |
|---|---|---|
| Data Freshness | Hours | Sub-second |
| Pipeline Model | Scheduled DAGs | Continuous SQL queries |
| Failure Recovery | Restart from scratch | Exactly-once checkpoint resume |
| Language | Python / Spark | Standard SQL |
| Scaling | Job-level scaling | Automatic parallelism |
| Late Data Handling | Manual reprocessing | Watermark-based automation |
How It Works
RisingWave lets you define entire ETL pipelines in SQL. Create sources to ingest from Kafka or CDC, write transformations as materialized views, and create sinks to deliver results to downstream systems. No DAG orchestration, no custom code, no infrastructure management.
Define sources, transforms, and sinks entirely in SQL. No Python, no Spark, no Airflow DAGs.
Ingest from Kafka, Pulsar, Kinesis, PostgreSQL CDC, MySQL CDC, and S3 with CREATE SOURCE.
Barrier-based checkpointing ensures no data loss or duplication across the entire pipeline.
Deliver to Kafka, Iceberg, Delta Lake, PostgreSQL, ClickHouse, Elasticsearch, Redis, and more.
Patterns
RisingWave supports the most common streaming ETL patterns out of the box, including CDC replication, stream enrichment with dimension joins, real-time aggregation pipelines, and multi-destination fan-out — all defined in standard SQL with automatic exactly-once guarantees.
Replace batch DAGs with continuous SQL pipelines in minutes.
Build Streaming ETL Pipelines Free