Debezium vs Native CDC: Choosing the Right Approach for RisingWave

RisingWave — a PostgreSQL-compatible streaming database — supports two distinct approaches to change data capture: routing changes through Debezium and Kafka, or using RisingWave's built-in CDC connectors that connect directly to the source database. Choosing the right approach depends on your fan-out requirements, operational complexity tolerance, and existing infrastructure.

The Core Trade-Off

Both approaches deliver real-time database change events into RisingWave, but they differ in architecture:

Debezium + Kafka: A source database → Debezium connector → Kafka topic → RisingWave Kafka source. Changes flow through Kafka, which acts as a durable, replayable buffer. Any number of consumers can read from the same Kafka topic independently.

Native CDC (RisingWave built-in): A source database → RisingWave CDC connector directly. RisingWave connects directly to the database replication stream. Simpler setup, fewer moving parts, but the change stream is consumed only by RisingWave.

How Native CDC Works in RisingWave

RisingWave ships with native CDC connectors for PostgreSQL and MySQL:

-- PostgreSQL native CDC (no Kafka needed)
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres.internal',
    port = '5432',
    username = 'rwuser',
    password = 'secret',
    database.name = 'production',
    schema.name = 'public',
    table.name = 'orders'
);

-- MySQL native CDC (no Kafka needed)
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at DATETIME
) WITH (
    connector = 'mysql-cdc',
    hostname = 'mysql.internal',
    port = '3306',
    username = 'rwuser',
    password = 'secret',
    database.name = 'shop',
    table.name = 'orders'
);

How Debezium + Kafka Works with RisingWave

With Debezium, changes flow through Kafka before reaching RisingWave:

-- RisingWave reads Debezium events from Kafka
CREATE SOURCE orders_cdc
WITH (
    connector = 'kafka',
    topic = 'dbserver1.public.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;

Step-by-Step: Native CDC Setup

Step 1: Create a Replication User in PostgreSQL

CREATE ROLE rwcdc REPLICATION LOGIN PASSWORD 'secret';
GRANT SELECT ON TABLE public.orders TO rwcdc;

Step 2: Create the CDC Table in RisingWave

CREATE TABLE orders_live (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres.internal',
    port = '5432',
    username = 'rwcdc',
    password = 'secret',
    database.name = 'production',
    schema.name = 'public',
    table.name = 'orders'
);

Step 3: Build a Materialized View

CREATE MATERIALIZED VIEW hourly_revenue AS
SELECT
    DATE_TRUNC('hour', created_at) AS hour,
    SUM(amount) AS revenue,
    COUNT(*) AS order_count
FROM orders_live
WHERE status = 'completed'
GROUP BY 1;

Step 4: Sink Downstream

CREATE SINK revenue_to_kafka
FROM hourly_revenue
WITH (
    connector = 'kafka',
    topic = 'hourly-revenue',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON
KEY ENCODE JSON (hour);

Comparison Table

Dimension	Native CDC (RisingWave)	Debezium + Kafka
Setup complexity	Low — single SQL statement	Medium — Kafka Connect + connector JSON + Kafka
Infrastructure required	Database + RisingWave	Database + Kafka + Kafka Connect + RisingWave
Fan-out (multiple consumers)	Limited to RisingWave	Any Kafka consumer can share the same topic
Replay old events	Limited (snapshot only)	Yes — replay from any Kafka offset
Offset management	Managed by RisingWave	Managed by Debezium in Kafka
Schema history	Managed internally	Stored in Kafka schema history topic
Dead letter queue	Not built-in	Supported via Kafka Connect DLQ
Operational overhead	Low	Medium-high
Best for	Single-consumer, simpler pipelines	Multi-consumer, existing Kafka infrastructure

When to Choose Debezium + Kafka

You already operate Kafka and want to consolidate change events in one place
Multiple downstream systems (data warehouse, search, cache, RisingWave) need to consume the same change stream
You need event replay capabilities — reading historical changes from any point in the Kafka retention window
You want to apply Kafka Connect SMTs (Single Message Transforms) to filter or enrich events before they reach consumers
You need connectors for databases RisingWave doesn't support natively (Oracle, SQL Server, MongoDB)

When to Choose Native CDC

You want the simplest possible setup with the fewest moving parts
RisingWave is the only consumer of the database change stream
You don't have Kafka in your infrastructure and don't want to add it
You're prototyping or running a smaller workload where operational simplicity outweighs flexibility

FAQ

Can I use both approaches in the same RisingWave cluster? Yes. You can have some tables using native CDC connectors and others using Kafka-based Debezium sources. Each source is independent.

Is native CDC less reliable than Debezium? Not necessarily. Both approaches use the database's native replication mechanism. The reliability difference is in the buffer: Kafka provides a durable, replayable log between the database and consumers. Without Kafka, if RisingWave is unavailable, the database must retain WAL/binlog until RisingWave catches up (via the replication slot or binlog retention).

Can RisingWave's native CDC scale to high-throughput databases? Yes. RisingWave's native CDC connectors are designed for production workloads. For extremely high-volume databases with many downstream consumers, Debezium + Kafka provides better horizontal scalability.

Key Takeaways

RisingWave supports both native CDC (postgres-cdc, mysql-cdc) and Debezium-via-Kafka CDC
Native CDC is simpler — no Kafka required — but limits the change stream to RisingWave as the sole consumer
Debezium + Kafka enables fan-out and event replay but adds operational complexity
Choose native CDC for simplicity; choose Debezium + Kafka for multi-consumer architectures or existing Kafka infrastructure

Get started | Slack