Disaggregated Storage in Stream Processing: Why S3 Changes Everything
Disaggregated storage separates compute from state in stream processing — storing all streaming state (aggregations, join buffers, window data) in object storage like S3 instead of local disks. This architectural pattern, adopted by RisingWave and Flink 2.0, fundamentally changes how streaming systems scale, recover from failures, and manage costs.
Traditional vs Disaggregated Architecture
Traditional (Coupled Compute + State)
Compute Node 1: [Processing] + [RocksDB State on Local Disk]
Compute Node 2: [Processing] + [RocksDB State on Local Disk]
Compute Node 3: [Processing] + [RocksDB State on Local Disk]
State lives on each node's local disk. If a node dies, its state is lost and must be restored from a remote checkpoint.
Disaggregated (Separated)
Compute Node 1: [Processing] + [In-Memory Cache]
Compute Node 2: [Processing] + [In-Memory Cache]
Compute Node 3: [Processing] + [In-Memory Cache]
↕ Read/Write ↕
[S3 — All State Stored Here]
State lives in S3. Compute nodes are relatively stateless. If a node dies, a replacement node reads state from S3 immediately.
Why Disaggregated Storage Matters
| Benefit | Traditional | Disaggregated |
| Recovery time | Minutes to hours (download checkpoint) | Seconds (state already in S3) |
| Scaling | Redistribute state across nodes (slow) | Add compute, point at same S3 (fast) |
| Disk failure | State lost, must restore | No impact (S3 = 11 nines durability) |
| Cost | Over-provision for peak state | S3 at $0.023/GB/month |
| Checkpoint time | Proportional to state size | Near-constant (~3-4 seconds) |
| State size limit | Limited by local disk | Unlimited (S3 is elastic) |
RisingWave: Built on S3 from Day One
RisingWave's Hummock storage engine is an LSM-tree designed specifically for streaming on S3:
- Mem-tables buffer writes in memory
- SSTables (Sorted String Tables) are flushed to S3
- Local cache serves reads (S3 is never on the read path during queries)
- Compactor nodes merge SSTables in the background
- 1-second checkpoints are feasible because state is already durable in S3
Flink 2.0: ForSt State Backend
Flink 2.0 introduced the ForSt (Flink on Remote Storage) state backend:
- State stored in S3/HDFS instead of local RocksDB
- 40x faster recovery compared to Flink 1.x
- Checkpoint duration: 3-4 seconds regardless of state size (vs minutes in 1.x)
- Backward compatible with existing Flink applications
Performance Impact
Common concern: "Doesn't reading from S3 add latency?"
In practice, no — because of caching:
- Hot state is cached in memory on compute nodes
- Warm state is cached on local SSD
- Cold state is fetched from S3 (rare, only during recovery or cold start)
During normal processing, all reads hit local cache. S3 is only involved during writes (async) and recovery.
When Disaggregated Storage Matters Most
- Large state (>10 GB): Local disk becomes a bottleneck
- Elastic workloads: Scale compute up/down without state migration
- High availability: Seconds-level recovery vs minutes
- Cost-sensitive: S3 storage is 10-100x cheaper than SSD
- Multi-tenant: Share storage across workloads
Frequently Asked Questions
Does S3-based state add latency to stream processing?
No, during normal processing. Hot state is cached in compute node memory. S3 is only involved during async checkpoint writes and recovery. RisingWave achieves sub-100ms end-to-end processing latency despite storing all state on S3.
Is disaggregated storage the future of stream processing?
Yes. Both RisingWave (S3-native from day one) and Flink 2.0 (ForSt backend) have adopted disaggregated storage. The benefits — elastic scaling, fast recovery, lower cost, unlimited state — outweigh the added complexity of remote storage management.
How does disaggregated storage affect checkpointing?
Dramatically. In traditional systems, checkpoints must copy state from local disk to remote storage — time proportional to state size. In disaggregated systems, state is already in S3, so checkpoints just commit metadata — completing in 3-4 seconds regardless of state size.

