Kafka Connect vs Native CDC: Which Approach Is Better?
Real-time data pipelines continuously move, transform, and deliver data from sources to destinations with sub-second latency. In 2026, the standard patterns involve CDC for database sources, Kafka for event streams, streaming SQL for transformation, and Iceberg for lakehouse storage.
Architecture Patterns
Pattern 1: CDC → Streaming SQL → Lakehouse
Database → RisingWave (CDC + SQL) → Iceberg → Trino/DuckDB
Pattern 2: Event Stream → Enrichment → Multiple Sinks
Kafka → RisingWave (join + aggregate) → Iceberg (analytics)
→ Kafka (downstream)
→ PostgreSQL (serving)
Pattern 3: Multi-Source Real-Time View
Database A (CDC) ──→ RisingWave ──→ Unified materialized views
Database B (CDC) ──→ (SQL joins) (queryable via PG protocol)
Kafka events ─────→
Tools for Real-Time Pipelines
| Tool | Role | Best For |
| RisingWave | Ingestion + processing + serving | SQL-native, end-to-end |
| Apache Flink | Processing | Complex stateful logic |
| Kafka | Event transport | Central event bus |
| Debezium | CDC capture | Broad database support |
| Iceberg | Storage | Lakehouse analytics |
Frequently Asked Questions
What tools do I need for a real-time data pipeline?
At minimum: a source connector (CDC or Kafka consumer), a processing engine (RisingWave or Flink), and a destination (Iceberg, database, or Kafka). RisingWave can serve as all three for PostgreSQL/MySQL CDC sources.
How do I choose between Flink and RisingWave for pipelines?
RisingWave for SQL-only, simpler operations, built-in serving. Flink for complex event processing, custom Java logic, broadest connector ecosystem.

