Debezium vs Native CDC: Choosing the Right Approach for RisingWave

Debezium vs Native CDC: Choosing the Right Approach for RisingWave

RisingWave — a PostgreSQL-compatible streaming database — supports two distinct approaches to change data capture: routing changes through Debezium and Kafka, or using RisingWave's built-in CDC connectors that connect directly to the source database. Choosing the right approach depends on your fan-out requirements, operational complexity tolerance, and existing infrastructure.

The Core Trade-Off

Both approaches deliver real-time database change events into RisingWave, but they differ in architecture:

Debezium + Kafka: A source database → Debezium connector → Kafka topic → RisingWave Kafka source. Changes flow through Kafka, which acts as a durable, replayable buffer. Any number of consumers can read from the same Kafka topic independently.

Native CDC (RisingWave built-in): A source database → RisingWave CDC connector directly. RisingWave connects directly to the database replication stream. Simpler setup, fewer moving parts, but the change stream is consumed only by RisingWave.

How Native CDC Works in RisingWave

RisingWave ships with native CDC connectors for PostgreSQL and MySQL:

-- PostgreSQL native CDC (no Kafka needed)
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres.internal',
    port = '5432',
    username = 'rwuser',
    password = 'secret',
    database.name = 'production',
    schema.name = 'public',
    table.name = 'orders'
);
-- MySQL native CDC (no Kafka needed)
CREATE TABLE orders (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at DATETIME
) WITH (
    connector = 'mysql-cdc',
    hostname = 'mysql.internal',
    port = '3306',
    username = 'rwuser',
    password = 'secret',
    database.name = 'shop',
    table.name = 'orders'
);

How Debezium + Kafka Works with RisingWave

With Debezium, changes flow through Kafka before reaching RisingWave:

-- RisingWave reads Debezium events from Kafka
CREATE SOURCE orders_cdc
WITH (
    connector = 'kafka',
    topic = 'dbserver1.public.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;

Step-by-Step: Native CDC Setup

Step 1: Create a Replication User in PostgreSQL

CREATE ROLE rwcdc REPLICATION LOGIN PASSWORD 'secret';
GRANT SELECT ON TABLE public.orders TO rwcdc;

Step 2: Create the CDC Table in RisingWave

CREATE TABLE orders_live (
    id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    amount DECIMAL,
    status VARCHAR,
    created_at TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres.internal',
    port = '5432',
    username = 'rwcdc',
    password = 'secret',
    database.name = 'production',
    schema.name = 'public',
    table.name = 'orders'
);

Step 3: Build a Materialized View

CREATE MATERIALIZED VIEW hourly_revenue AS
SELECT
    DATE_TRUNC('hour', created_at) AS hour,
    SUM(amount) AS revenue,
    COUNT(*) AS order_count
FROM orders_live
WHERE status = 'completed'
GROUP BY 1;

Step 4: Sink Downstream

CREATE SINK revenue_to_kafka
FROM hourly_revenue
WITH (
    connector = 'kafka',
    topic = 'hourly-revenue',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON
KEY ENCODE JSON (hour);

Comparison Table

DimensionNative CDC (RisingWave)Debezium + Kafka
Setup complexityLow — single SQL statementMedium — Kafka Connect + connector JSON + Kafka
Infrastructure requiredDatabase + RisingWaveDatabase + Kafka + Kafka Connect + RisingWave
Fan-out (multiple consumers)Limited to RisingWaveAny Kafka consumer can share the same topic
Replay old eventsLimited (snapshot only)Yes — replay from any Kafka offset
Offset managementManaged by RisingWaveManaged by Debezium in Kafka
Schema historyManaged internallyStored in Kafka schema history topic
Dead letter queueNot built-inSupported via Kafka Connect DLQ
Operational overheadLowMedium-high
Best forSingle-consumer, simpler pipelinesMulti-consumer, existing Kafka infrastructure

When to Choose Debezium + Kafka

  • You already operate Kafka and want to consolidate change events in one place
  • Multiple downstream systems (data warehouse, search, cache, RisingWave) need to consume the same change stream
  • You need event replay capabilities — reading historical changes from any point in the Kafka retention window
  • You want to apply Kafka Connect SMTs (Single Message Transforms) to filter or enrich events before they reach consumers
  • You need connectors for databases RisingWave doesn't support natively (Oracle, SQL Server, MongoDB)

When to Choose Native CDC

  • You want the simplest possible setup with the fewest moving parts
  • RisingWave is the only consumer of the database change stream
  • You don't have Kafka in your infrastructure and don't want to add it
  • You're prototyping or running a smaller workload where operational simplicity outweighs flexibility

FAQ

Can I use both approaches in the same RisingWave cluster? Yes. You can have some tables using native CDC connectors and others using Kafka-based Debezium sources. Each source is independent.

Is native CDC less reliable than Debezium? Not necessarily. Both approaches use the database's native replication mechanism. The reliability difference is in the buffer: Kafka provides a durable, replayable log between the database and consumers. Without Kafka, if RisingWave is unavailable, the database must retain WAL/binlog until RisingWave catches up (via the replication slot or binlog retention).

Can RisingWave's native CDC scale to high-throughput databases? Yes. RisingWave's native CDC connectors are designed for production workloads. For extremely high-volume databases with many downstream consumers, Debezium + Kafka provides better horizontal scalability.

Key Takeaways

  • RisingWave supports both native CDC (postgres-cdc, mysql-cdc) and Debezium-via-Kafka CDC
  • Native CDC is simpler — no Kafka required — but limits the change stream to RisingWave as the sole consumer
  • Debezium + Kafka enables fan-out and event replay but adds operational complexity
  • Choose native CDC for simplicity; choose Debezium + Kafka for multi-consumer architectures or existing Kafka infrastructure

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.