Database Replication with CDC and Streaming SQL
Zero-ETL promises to eliminate data movement pipelines entirely — your analytical queries run directly on operational data. AWS, Snowflake, and Databricks all offer "zero-ETL" features. But the reality is nuanced: zero-ETL replaces some pipelines but not all, and streaming ETL remains necessary for real-time workloads.
What "Zero-ETL" Actually Means
| Vendor | Zero-ETL Feature | What It Does |
| AWS | Aurora → Redshift zero-ETL | Automatic CDC replication, no pipeline management |
| Snowflake | Dynamic Tables | Incremental materialized views refreshed on schedule |
| Databricks | Delta Live Tables | Declarative ETL with incremental processing |
These are essentially managed CDC + incremental processing — the "ETL" still happens, it's just abstracted away.
Zero-ETL vs Streaming ETL
| Aspect | Zero-ETL | Streaming ETL (RisingWave) |
| Freshness | Minutes (scheduled) | Sub-second |
| Vendor lock-in | High (AWS→Redshift, etc.) | Low (open source) |
| Custom transforms | Limited | Full SQL |
| Multi-source joins | ❌ | ✅ |
| Serving | Via warehouse only | Built-in (PG protocol) |
When You Still Need Streaming
Zero-ETL handles simple replication well. But for:
- Sub-second latency requirements
- Complex multi-source transformations
- Real-time serving (APIs, agents)
- Non-vendor-locked architecture
...streaming ETL with RisingWave provides more capability and flexibility.
Frequently Asked Questions
Is zero-ETL really zero?
No. The ETL still happens — CDC captures changes, data is transformed and loaded. "Zero-ETL" means zero pipeline management, not zero data movement. The vendor handles it automatically.
Should I use zero-ETL or streaming ETL?
Use zero-ETL if you're locked into a single cloud ecosystem and minutes-level freshness is sufficient. Use streaming ETL if you need sub-second freshness, multi-source joins, or vendor independence.

