RisingWave — a PostgreSQL-compatible streaming database — supports two distinct approaches to change data capture: routing changes through Debezium and Kafka, or using RisingWave's built-in CDC connectors that connect directly to the source database. Choosing the right approach depends on your fan-out requirements, operational complexity tolerance, and existing infrastructure.
The Core Trade-Off
Both approaches deliver real-time database change events into RisingWave, but they differ in architecture:
Debezium + Kafka: A source database → Debezium connector → Kafka topic → RisingWave Kafka source. Changes flow through Kafka, which acts as a durable, replayable buffer. Any number of consumers can read from the same Kafka topic independently.
Native CDC (RisingWave built-in): A source database → RisingWave CDC connector directly. RisingWave connects directly to the database replication stream. Simpler setup, fewer moving parts, but the change stream is consumed only by RisingWave.
How Native CDC Works in RisingWave
RisingWave ships with native CDC connectors for PostgreSQL and MySQL:
-- PostgreSQL native CDC (no Kafka needed)
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
amount DECIMAL,
status VARCHAR,
created_at TIMESTAMPTZ
) WITH (
connector = 'postgres-cdc',
hostname = 'postgres.internal',
port = '5432',
username = 'rwuser',
password = 'secret',
database.name = 'production',
schema.name = 'public',
table.name = 'orders'
);
-- MySQL native CDC (no Kafka needed)
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
amount DECIMAL,
status VARCHAR,
created_at DATETIME
) WITH (
connector = 'mysql-cdc',
hostname = 'mysql.internal',
port = '3306',
username = 'rwuser',
password = 'secret',
database.name = 'shop',
table.name = 'orders'
);
How Debezium + Kafka Works with RisingWave
With Debezium, changes flow through Kafka before reaching RisingWave:
-- RisingWave reads Debezium events from Kafka
CREATE SOURCE orders_cdc
WITH (
connector = 'kafka',
topic = 'dbserver1.public.orders',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;
Step-by-Step: Native CDC Setup
Step 1: Create a Replication User in PostgreSQL
CREATE ROLE rwcdc REPLICATION LOGIN PASSWORD 'secret';
GRANT SELECT ON TABLE public.orders TO rwcdc;
Step 2: Create the CDC Table in RisingWave
CREATE TABLE orders_live (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
amount DECIMAL,
status VARCHAR,
created_at TIMESTAMPTZ
) WITH (
connector = 'postgres-cdc',
hostname = 'postgres.internal',
port = '5432',
username = 'rwcdc',
password = 'secret',
database.name = 'production',
schema.name = 'public',
table.name = 'orders'
);
Step 3: Build a Materialized View
CREATE MATERIALIZED VIEW hourly_revenue AS
SELECT
DATE_TRUNC('hour', created_at) AS hour,
SUM(amount) AS revenue,
COUNT(*) AS order_count
FROM orders_live
WHERE status = 'completed'
GROUP BY 1;
Step 4: Sink Downstream
CREATE SINK revenue_to_kafka
FROM hourly_revenue
WITH (
connector = 'kafka',
topic = 'hourly-revenue',
properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON
KEY ENCODE JSON (hour);
Comparison Table
| Dimension | Native CDC (RisingWave) | Debezium + Kafka |
| Setup complexity | Low — single SQL statement | Medium — Kafka Connect + connector JSON + Kafka |
| Infrastructure required | Database + RisingWave | Database + Kafka + Kafka Connect + RisingWave |
| Fan-out (multiple consumers) | Limited to RisingWave | Any Kafka consumer can share the same topic |
| Replay old events | Limited (snapshot only) | Yes — replay from any Kafka offset |
| Offset management | Managed by RisingWave | Managed by Debezium in Kafka |
| Schema history | Managed internally | Stored in Kafka schema history topic |
| Dead letter queue | Not built-in | Supported via Kafka Connect DLQ |
| Operational overhead | Low | Medium-high |
| Best for | Single-consumer, simpler pipelines | Multi-consumer, existing Kafka infrastructure |
When to Choose Debezium + Kafka
- You already operate Kafka and want to consolidate change events in one place
- Multiple downstream systems (data warehouse, search, cache, RisingWave) need to consume the same change stream
- You need event replay capabilities — reading historical changes from any point in the Kafka retention window
- You want to apply Kafka Connect SMTs (Single Message Transforms) to filter or enrich events before they reach consumers
- You need connectors for databases RisingWave doesn't support natively (Oracle, SQL Server, MongoDB)
When to Choose Native CDC
- You want the simplest possible setup with the fewest moving parts
- RisingWave is the only consumer of the database change stream
- You don't have Kafka in your infrastructure and don't want to add it
- You're prototyping or running a smaller workload where operational simplicity outweighs flexibility
FAQ
Can I use both approaches in the same RisingWave cluster? Yes. You can have some tables using native CDC connectors and others using Kafka-based Debezium sources. Each source is independent.
Is native CDC less reliable than Debezium? Not necessarily. Both approaches use the database's native replication mechanism. The reliability difference is in the buffer: Kafka provides a durable, replayable log between the database and consumers. Without Kafka, if RisingWave is unavailable, the database must retain WAL/binlog until RisingWave catches up (via the replication slot or binlog retention).
Can RisingWave's native CDC scale to high-throughput databases? Yes. RisingWave's native CDC connectors are designed for production workloads. For extremely high-volume databases with many downstream consumers, Debezium + Kafka provides better horizontal scalability.
Key Takeaways
- RisingWave supports both native CDC (
postgres-cdc,mysql-cdc) and Debezium-via-Kafka CDC - Native CDC is simpler — no Kafka required — but limits the change stream to RisingWave as the sole consumer
- Debezium + Kafka enables fan-out and event replay but adds operational complexity
- Choose native CDC for simplicity; choose Debezium + Kafka for multi-consumer architectures or existing Kafka infrastructure

