Exactly-Once Semantics in Stream Processing: How It Works
Exactly-once semantics means every event in a stream is processed exactly one time — no duplicates, no data loss — even when failures occur. This is the strongest processing guarantee in stream processing and is critical for financial transactions, billing systems, and any workload where duplicate or missing events cause business impact. Apache Flink, RisingWave, and Kafka Streams all support exactly-once semantics through different mechanisms.
The Three Processing Guarantees
| Guarantee | Description | Duplicates? | Data Loss? | Complexity |
| At-most-once | Fire and forget | No | Possible | Lowest |
| At-least-once | Retry on failure | Possible | No | Medium |
| Exactly-once | Each event processed once | No | No | Highest |
How Exactly-Once Works
Checkpoint-Based (Flink, RisingWave)
The system periodically snapshots all operator state. On failure, it restores from the last checkpoint and replays events from the source:
- Checkpoint: Save consistent snapshot of all state to durable storage (S3)
- Failure: Node crashes, losing in-memory state
- Recovery: Restore state from checkpoint, reset source offsets to checkpoint position
- Replay: Reprocess events from checkpoint offset — but state ensures results are identical
Flink: Uses Chandy-Lamport distributed snapshots with two-phase commit for end-to-end exactly-once RisingWave: Barrier-based checkpointing with 1-second intervals, state on S3
Changelog-Based (Kafka Streams)
Every state mutation is written to a Kafka changelog topic. On failure, replay the changelog:
- Processing: Each state change written to both local RocksDB and Kafka changelog
- Failure: Node crashes
- Recovery: New node replays changelog from last committed offset
- Result: State rebuilt identically
Combined with Kafka's transactional producers (processing.guarantee=exactly_once_v2), this provides end-to-end exactly-once within the Kafka ecosystem.
The Cost of Exactly-Once
Exactly-once isn't free:
- Latency overhead: Checkpoint coordination adds milliseconds to processing
- Storage cost: Checkpoints/changelogs consume storage
- Throughput impact: Transactional writes reduce peak throughput by 10-30%
For many workloads (logging, metrics, clickstream), at-least-once with downstream deduplication is sufficient and cheaper.
When You Need Exactly-Once
- Financial transactions (payments, trades)
- Billing and metering
- Inventory management
- Any system where duplicates cause monetary impact
When At-Least-Once Is Fine
- Log aggregation (duplicate log lines are harmless)
- Metrics collection (idempotent counters handle duplicates)
- Clickstream analytics (slight overcounting is acceptable)
Frequently Asked Questions
Is exactly-once semantics really possible?
Yes, within the scope of the processing system. Flink, RisingWave, and Kafka Streams achieve exactly-once through coordinated checkpointing and transactional writes. End-to-end exactly-once (including external sinks) requires the sink to support transactions or idempotent writes.
Which stream processor has the best exactly-once support?
Apache Flink has the most mature end-to-end exactly-once implementation, using two-phase commit across sources and sinks. RisingWave provides exactly-once with 1-second checkpoints and S3 state. Kafka Streams supports exactly-once within the Kafka ecosystem.
Does exactly-once affect performance?
Yes. Exactly-once adds overhead from checkpoint coordination, transactional writes, and state snapshots. Throughput typically decreases 10-30% compared to at-least-once. For most workloads, this trade-off is worth the correctness guarantee.

