What Is Kafka Connect? Connectors, Transforms, and Converters
RocksDB is an embeddable key-value store used as the state backend in Apache Flink and Kafka Streams. It stores streaming operator state (aggregation results, join buffers, window data) on local disk, enabling processing of state larger than memory.
How RocksDB Works in Streaming
Streaming Operator
↓ State reads/writes
RocksDB (LSM Tree)
↓ Compaction
Local SSD / Disk
↓ Checkpoint
Remote Storage (S3)
LSM Tree Architecture
RocksDB uses a Log-Structured Merge Tree:
- Memtable: In-memory write buffer (fast writes)
- SSTable: Immutable sorted files on disk (fast reads with bloom filters)
- Compaction: Background process merging SSTables to reduce read amplification
RocksDB Challenges in Streaming
| Challenge | Impact | Mitigation |
| Tuning complexity | 30+ configuration parameters | Expert knowledge required |
| Local disk dependency | Disk failure = state loss | Checkpoints to S3 |
| Recovery time | Download checkpoint → rebuild | Minutes for large state |
| Memory management | Block cache competes with JVM heap | Careful sizing |
The S3 Alternative
RisingWave's Hummock and Flink 2.0's ForSt replace local RocksDB with S3-based state:
- No local disk failures
- Seconds-level recovery (state already in S3)
- No RocksDB tuning required
- Elastic scaling without state migration
Frequently Asked Questions
Why is RocksDB used in stream processing?
RocksDB provides fast key-value access with support for state larger than memory (spills to disk). Its LSM tree architecture is optimized for write-heavy workloads common in streaming.
Is RocksDB being replaced?
For cloud-native streaming, yes. Flink 2.0 (ForSt) and RisingWave (Hummock) store state on S3 instead of local RocksDB. This eliminates tuning complexity and enables faster recovery.

