Database Replication with CDC and Streaming SQL

Zero-ETL promises to eliminate data movement pipelines entirely — your analytical queries run directly on operational data. AWS, Snowflake, and Databricks all offer "zero-ETL" features. But the reality is nuanced: zero-ETL replaces some pipelines but not all, and streaming ETL remains necessary for real-time workloads.

What "Zero-ETL" Actually Means

Vendor	Zero-ETL Feature	What It Does
AWS	Aurora → Redshift zero-ETL	Automatic CDC replication, no pipeline management
Snowflake	Dynamic Tables	Incremental materialized views refreshed on schedule
Databricks	Delta Live Tables	Declarative ETL with incremental processing

These are essentially managed CDC + incremental processing — the "ETL" still happens, it's just abstracted away.

Zero-ETL vs Streaming ETL

Aspect	Zero-ETL	Streaming ETL (RisingWave)
Freshness	Minutes (scheduled)	Sub-second
Vendor lock-in	High (AWS→Redshift, etc.)	Low (open source)
Custom transforms	Limited	Full SQL
Multi-source joins	❌	✅
Serving	Via warehouse only	Built-in (PG protocol)

When You Still Need Streaming

Zero-ETL handles simple replication well. But for:

Sub-second latency requirements
Complex multi-source transformations
Real-time serving (APIs, agents)
Non-vendor-locked architecture

...streaming ETL with RisingWave provides more capability and flexibility.

Frequently Asked Questions

Is zero-ETL really zero?

No. The ETL still happens — CDC captures changes, data is transformed and loaded. "Zero-ETL" means zero pipeline management, not zero data movement. The vendor handles it automatically.

Should I use zero-ETL or streaming ETL?

Use zero-ETL if you're locked into a single cloud ecosystem and minutes-level freshness is sufficient. Use streaming ETL if you need sub-second freshness, multi-source joins, or vendor independence.

Zero-ETL Architecture: Is It Really Possible? (2026)