Exactly-Once Semantics

Exactly-Once Semantics is the strongest processing guarantee in distributed systems, particularly in stream processing. It ensures that each input event (or message) is processed and its effects are reflected in the system's state and output exactly one time, even in the presence of failures (like node crashes, network issues, or restarts).

This guarantee is crucial for applications where duplicate processing or data loss would lead to incorrect results or inconsistent state, such as financial transactions, critical alerting, or maintaining accurate counts and aggregations.

Processing Semantics Compared

At-Most-Once: Each event is processed zero or one time. If a failure occurs after an event is received but before its processing is completed and acknowledged, the event might be lost. This offers the lowest overhead but is prone to data loss.
At-Least-Once: Each event is guaranteed to be processed one or more times. This avoids data loss by retrying processing upon failure, but it can lead to duplicate results if an acknowledgment is lost and processing is retried for an event whose effects were already committed.
Exactly-Once: Each event is processed as if it happened precisely one time. Failures are handled such that events are neither lost nor processed multiple times to affect the final state or output.

Challenges in Achieving Exactly-Once

Achieving true end-to-end exactly-once semantics (from source, through the processor, to the sink) is complex due to potential failures at various stages:

Source Failures/Replay: Sources need to reliably replay messages if the processor fails after consuming but before checkpointing.
Processor Failures: The stream processor itself must reliably manage its internal state during failures and recovery.
Sink Failures/Retries: Writing results to external sinks (databases, message queues) can fail. Simple retries can lead to duplicates in the sink if the failure occurred after the write but before the acknowledgment.

Achieving Exactly-Once Semantics

Exactly-once processing within the stream processor itself is typically achieved through robust Checkpointing or transaction mechanisms:

State Checkpointing: The processor periodically snapshots its internal state (operator state, source offsets) consistently. Upon recovery, it restores from the last successful checkpoint and resumes processing, ensuring internal consistency.
Coordination with Sinks: The key challenge lies in coordinating state commits within the processor with writes to external sinks. Common techniques include:
- Transactional Sinks (Two-Phase Commit - 2PC): Sinks that support transactions can participate in a two-phase commit protocol coordinated by the stream processor. The processor starts a transaction, writes data, completes its internal checkpoint, and then commits the transaction. If any part fails, the transaction is aborted. This provides strong guarantees but requires specific sink capabilities and adds latency.
- Idempotent Sinks: Sinks designed to handle duplicate writes safely (e.g., using unique IDs per message or update). The processor can safely retry writes upon failure, knowing the sink will ignore duplicates. This requires careful sink implementation or specific database features (like UPSERT with primary keys).

Exactly-Once in RisingWave

RisingWave provides exactly-once semantics for its internal state management through its distributed, consistent Checkpointing mechanism built on the Hummock state store.

Internal State: Upon recovery from failure, RisingWave restores its operator states (aggregations, joins, window results) from the latest successful checkpoint, ensuring that internal calculations reflect each input message exactly once relative to that state.
Sink Guarantees: Achieving end-to-end exactly-once depends on the capabilities of the chosen Sink connector and the target system:
- At-Least-Once Sinks: For many standard sinks (like a basic Kafka sink without transactional IDs), RisingWave typically guarantees at-least-once delivery. If a failure occurs during the sink write/checkpoint commit phase, retries might lead to duplicates in the target system.
- Exactly-Once Sinks: RisingWave offers specific sink connectors designed for exactly-once, often leveraging idempotency or transactional capabilities of the target system:
  - JDBC Sinks: Can often achieve exactly-once using UPSERT operations based on primary keys derived from the changelog stream.
  - Kafka Sinks: Can leverage Kafka's idempotent producer or transactional capabilities.
  - Iceberg/Delta Lake Sinks: These sinks manage commits atomically as part of the table format's transaction log, coordinated with RisingWave's checkpointing, providing exactly-once writes into the lakehouse table.

Therefore, while RisingWave guarantees exactly-once for its internal state, users must select and configure appropriate sink connectors to achieve end-to-end exactly-once semantics for their specific pipeline.

Exactly-Once Semantics

Processing Semantics Compared

Challenges in Achieving Exactly-Once

Achieving Exactly-Once Semantics

Exactly-Once in RisingWave

Related Blog Posts

Related Glossary Terms