Exactly-Once Semantics in Stream Processing: How It Works

Exactly-Once Semantics in Stream Processing: How It Works

Exactly-Once Semantics in Stream Processing: How It Works

Exactly-once semantics means every event in a stream is processed exactly one time — no duplicates, no data loss — even when failures occur. This is the strongest processing guarantee in stream processing and is critical for financial transactions, billing systems, and any workload where duplicate or missing events cause business impact. Apache Flink, RisingWave, and Kafka Streams all support exactly-once semantics through different mechanisms.

The Three Processing Guarantees

GuaranteeDescriptionDuplicates?Data Loss?Complexity
At-most-onceFire and forgetNoPossibleLowest
At-least-onceRetry on failurePossibleNoMedium
Exactly-onceEach event processed onceNoNoHighest

How Exactly-Once Works

The system periodically snapshots all operator state. On failure, it restores from the last checkpoint and replays events from the source:

  1. Checkpoint: Save consistent snapshot of all state to durable storage (S3)
  2. Failure: Node crashes, losing in-memory state
  3. Recovery: Restore state from checkpoint, reset source offsets to checkpoint position
  4. Replay: Reprocess events from checkpoint offset — but state ensures results are identical

Flink: Uses Chandy-Lamport distributed snapshots with two-phase commit for end-to-end exactly-once RisingWave: Barrier-based checkpointing with 1-second intervals, state on S3

Changelog-Based (Kafka Streams)

Every state mutation is written to a Kafka changelog topic. On failure, replay the changelog:

  1. Processing: Each state change written to both local RocksDB and Kafka changelog
  2. Failure: Node crashes
  3. Recovery: New node replays changelog from last committed offset
  4. Result: State rebuilt identically

Combined with Kafka's transactional producers (processing.guarantee=exactly_once_v2), this provides end-to-end exactly-once within the Kafka ecosystem.

The Cost of Exactly-Once

Exactly-once isn't free:

  • Latency overhead: Checkpoint coordination adds milliseconds to processing
  • Storage cost: Checkpoints/changelogs consume storage
  • Throughput impact: Transactional writes reduce peak throughput by 10-30%

For many workloads (logging, metrics, clickstream), at-least-once with downstream deduplication is sufficient and cheaper.

When You Need Exactly-Once

  • Financial transactions (payments, trades)
  • Billing and metering
  • Inventory management
  • Any system where duplicates cause monetary impact

When At-Least-Once Is Fine

  • Log aggregation (duplicate log lines are harmless)
  • Metrics collection (idempotent counters handle duplicates)
  • Clickstream analytics (slight overcounting is acceptable)

Frequently Asked Questions

Is exactly-once semantics really possible?

Yes, within the scope of the processing system. Flink, RisingWave, and Kafka Streams achieve exactly-once through coordinated checkpointing and transactional writes. End-to-end exactly-once (including external sinks) requires the sink to support transactions or idempotent writes.

Which stream processor has the best exactly-once support?

Apache Flink has the most mature end-to-end exactly-once implementation, using two-phase commit across sources and sinks. RisingWave provides exactly-once with 1-second checkpoints and S3 state. Kafka Streams supports exactly-once within the Kafka ecosystem.

Does exactly-once affect performance?

Yes. Exactly-once adds overhead from checkpoint coordination, transactional writes, and state snapshots. Throughput typically decreases 10-30% compared to at-least-once. For most workloads, this trade-off is worth the correctness guarantee.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.