CDC Exactly-Once Semantics: Debezium vs RisingWave Guarantees

Exactly-once semantics is the hardest problem in CDC. Debezium provides at-least-once delivery into Kafka; with Kafka transactions and idempotent consumers, you can achieve effectively-once end-to-end, but it requires careful configuration across every layer. RisingWave provides exactly-once semantics for CDC materialized views through its own checkpointing, without requiring Kafka at all.

What "Exactly-Once" Actually Means

Exactly-once is frequently misused. It is worth being precise about what the term covers at each layer of a CDC pipeline.

At the source: The database WAL is a total order of changes. Reading it is at-most-once per event from the WAL's perspective — each change record exists exactly once in the log. The question is whether the connector can resume from a checkpoint without re-emitting records it already sent.

At the transport: Kafka guarantees at-least-once delivery by default. With idempotent producers (enable.idempotence=true) and transactions, Kafka can achieve exactly-once within a single producer-to-topic write. But this only applies within that write boundary.

At the consumer: Downstream consumers must commit their consumed offsets atomically with their output writes. Without this, a consumer crash between processing a record and committing the offset causes re-processing on restart.

End-to-end: True end-to-end exactly-once means a change event from the source database results in exactly one corresponding update in the destination system, even across connector restarts, broker failures, and consumer crashes.

Debezium's Delivery Guarantee: At-Least-Once

Debezium commits its offset to Kafka after it successfully writes events to Kafka topics. If the connector crashes after writing events but before committing the offset, those events will be re-emitted on restart. This is by design — it protects against event loss at the cost of potential duplicates.

The Kafka Connect framework controls offset commit behavior:

# In the Connect worker config
offset.flush.interval.ms=60000
offset.flush.timeout.ms=5000

Every 60 seconds (default), Connect flushes offsets. Any events written in the window between two flushes can be re-emitted on crash. Reducing offset.flush.interval.ms reduces the duplicate window but increases Kafka write amplification.

The Kafka Transactions Path to Effectively-Once

With Kafka's transaction API, a producer can write to multiple partitions and commit offsets atomically. Debezium supports this through the outbox pattern combined with transactional producers:

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "producer.override.enable.idempotence": "true",
  "producer.override.acks": "all",
  "producer.override.max.in.flight.requests.per.connection": "1",
  "producer.override.transactional.id": "debezium-orders-connector"
}

With these settings, Debezium uses an idempotent producer. The broker deduplicates retries from the same producer epoch + sequence number. This prevents in-flight duplicates but does not prevent the offset-commit-gap duplicate described above.

For the downstream consumer to achieve effectively-once, it must read with isolation.level=read_committed and write output atomically with offset commits:

consumer.subscribe(Collections.singletonList("shop.public.orders"));
consumer.poll(Duration.ofMillis(100)).forEach(record -> {
    // Write to destination atomically with offset commit
    database.beginTransaction();
    database.upsert(record);
    database.commit();
    consumer.commitSync(Map.of(
        new TopicPartition(record.topic(), record.partition()),
        new OffsetAndMetadata(record.offset() + 1)
    ));
});

This pattern is called "read-process-commit" and achieves effectively-once only if the destination write is idempotent or the commit is truly atomic with the write. Most databases support this via upsert semantics or deduplication tables.

Where Debezium + Kafka Falls Short

Even with all the above configured correctly, there are gaps:

Multi-destination fan-out. If you have two downstream consumers writing to different systems, you cannot atomically commit to both. One may succeed and one may fail, leaving them inconsistent.

Snapshot phase. During the initial snapshot, Debezium reads the table using a consistent read transaction but emits events without Kafka transaction semantics. If the connector crashes mid-snapshot, it restarts the snapshot from the beginning, causing duplicates.

Schema changes. A connector restart forced by a schema change re-emits events from the last committed offset, regardless of transaction settings.

RisingWave's Exactly-Once Guarantee

RisingWave provides exactly-once semantics for CDC materialized views through epoch-based checkpointing. This is a different architectural approach, not just a configuration choice.

How it works:

RisingWave reads CDC events from the source database using the Debezium Embedded Engine.
Events are processed through a streaming dataflow graph (filters, joins, aggregations).
Periodically (default every 10 seconds), RisingWave injects a checkpoint barrier into the stream.
Each operator passes the barrier downstream only after flushing all state to S3.
Once all operators in the graph acknowledge the barrier, the checkpoint is complete.
The CDC source position (LSN or binlog offset) is committed atomically with the checkpoint.

-- View checkpoint configuration
SHOW checkpoint_frequency;

-- Set checkpoint interval
SET checkpoint_frequency = 10; -- seconds

If RisingWave crashes mid-processing, it restores from the last complete checkpoint. The CDC source replays from the corresponding WAL position. Events between the last checkpoint and the crash are reprocessed — but because all state (including output materialized views) is restored to the checkpoint, the reprocessed events produce exactly the same output. The result is exactly-once semantics for materialized view contents.

This is the same principle used by Apache Flink's Chandy-Lamport checkpoint algorithm. RisingWave implements this natively for CDC without requiring the user to manage Kafka transactions or idempotent consumers.

A Direct Comparison

Dimension	Debezium + Kafka	RisingWave CDC
Delivery to Kafka	At-least-once	Not applicable (no Kafka)
Delivery to output	Effectively-once with transactions	Exactly-once via checkpointing
Duplicate prevention	Idempotent producer + consumer	Checkpoint + state restore
Snapshot duplicates	Possible on crash	Handled by checkpoint
Multi-destination consistency	Not guaranteed	Within RisingWave, consistent
Configuration complexity	High (producer + consumer tuning)	Low (checkpoint interval only)
Best for	Multi-consumer fan-out, existing Kafka infra	Single-destination analytics, operational views

What "Exactly-Once" Does Not Cover in Either System

Database-level guarantees. If your PostgreSQL source crashes between writing a row and committing the transaction, that row may not appear in the WAL at all. CDC captures committed transactions only — this is a feature, not a gap.

Network-level duplicates at the source. If the CDC connector reads a WAL segment and the database WAL is itself corrupted or truncated due to a catastrophic failure, data can be lost regardless of exactly-once settings.

Clock skew. Exactly-once says nothing about ordering guarantees across tables or sources. If you join two CDC streams with different checkpoint positions, you may see inconsistent snapshots momentarily.

When Debezium's At-Least-Once Is Acceptable

For many pipelines, at-least-once with idempotent consumers is sufficient. If your destination supports upserts keyed on the primary key, duplicate events result in idempotent writes. The pipeline is effectively-once from the user's perspective.

Debezium's model is explicitly the right choice when you have multiple downstream consumers needing the same change stream — analytics, search indexing, cache invalidation, audit logging. Kafka's replayable log is the right abstraction for fan-out. RisingWave does not replace this use case.

FAQ

Does Debezium guarantee ordering within a table? Yes. Within a single table partition, events are ordered by WAL sequence. Across tables or across partitions of the same table (for tables with multiple Kafka partitions), global ordering is not guaranteed by default. Use transaction.topic support or a single partition per table if strict ordering is required.

Does RisingWave's exactly-once extend to external sinks? RisingWave's internal state and materialized views have exactly-once guarantees. For external sinks (writing to S3, MySQL, Elasticsearch), exactly-once depends on whether the sink supports idempotent writes or transactional commits. RisingWave's sink framework uses at-least-once delivery to external systems by default, with idempotent upserts where the target supports them.

Can I trust Debezium offsets as a consistency boundary? Not for cross-connector consistency. Two connectors reading two tables will have independent offset stores. If you need a consistent view of a JOIN across two tables, you need a system (like RisingWave or Flink) that manages a unified checkpoint across all sources.

What is the outbox pattern and does it help with exactly-once? The outbox pattern writes CDC events to an "outbox" table in the same transaction as the business logic update. Debezium reads the outbox table, ensuring that events are emitted only for committed transactions. This guarantees at-least-once delivery tied to the source transaction boundary, but does not eliminate downstream duplicates — the consumer still needs idempotent processing.

How does Flink compare to RisingWave for exactly-once CDC? Both implement Chandy-Lamport checkpointing for exactly-once. Flink's CDC connectors (via flink-cdc-connectors) provide the same guarantees as RisingWave for streaming joins and aggregations. RisingWave's advantage is SQL-native ergonomics and PostgreSQL compatibility, making it accessible without a Java/Scala deployment.