MySQL CDC: Binlog Streaming for Real-Time Data Pipelines
CDC to data warehouse replaces nightly batch ETL with continuous streaming, keeping your warehouse minutes-fresh instead of hours-stale. Instead of scheduled SELECT * dumps, CDC captures only the changes and streams them continuously.
Batch ETL vs CDC Streaming
| Aspect | Batch ETL (nightly) | CDC Streaming |
| Freshness | Hours (daily load) | Minutes (continuous) |
| Source impact | High (full table scan) | Low (log reading) |
| Data volume | Full tables each run | Changes only |
| Handles deletes | Difficult (soft deletes) | ✅ Native |
| Schema changes | Manual intervention | Automatic (with Iceberg) |
CDC-to-Warehouse Architecture
Production DB ──CDC──→ RisingWave ──Iceberg Sink──→ Warehouse/Lakehouse
↑ ↓
SQL Transforms Trino / Snowflake
(clean, enrich) (BI queries)
Frequently Asked Questions
Can CDC replace my nightly ETL jobs?
For operational data replication, yes. CDC provides continuous, change-only data delivery to your warehouse. For complex transformations involving multiple sources, you may still need batch transformation layers (dbt) on top of the CDC-delivered data.
How fresh will my warehouse data be?
With CDC streaming to Iceberg, data freshness is typically 1-5 minutes. With a streaming database like RisingWave serving directly, freshness is sub-second.

