CDC to Data Warehouse: Real-Time Alternative to Batch ETL
Database replication with CDC and streaming SQL creates always-in-sync copies of your operational data for analytics, disaster recovery, or multi-region serving — without batch ETL jobs or manual synchronization.
Replication Patterns
1. Operational DB → Analytics Replica
-- RisingWave: replicate + transform
CREATE TABLE orders (...) FROM pg_source TABLE 'public.orders';
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT DATE(order_time) as day, SUM(amount) as revenue
FROM orders WHERE status='completed' GROUP BY DATE(order_time);
2. Multi-Database Consolidation
-- Combine data from PostgreSQL and MySQL into unified views
CREATE TABLE pg_orders (...) FROM pg_source TABLE 'public.orders';
CREATE TABLE mysql_products (...) FROM mysql_source TABLE 'shop.products';
CREATE MATERIALIZED VIEW enriched_orders AS
SELECT o.*, p.name, p.category FROM pg_orders o JOIN mysql_products p ON o.product_id = p.id;
3. Streaming to Iceberg for Long-Term Storage
CREATE SINK replicated_orders AS SELECT * FROM orders
WITH (connector='iceberg', type='upsert', primary_key='order_id', ...);
Frequently Asked Questions
How does CDC replication compare to logical replication?
CDC replication with a streaming database adds transformation capability (SQL joins, aggregations, filtering) and multi-destination support (Iceberg, Kafka, PostgreSQL). Logical replication is simpler but limited to database-to-database copying without transformation.

