State management is the part of stream processing that separates the interesting systems from the operationally painful ones. Any stream processor can filter records or reformat JSON. But the moment you need a running count, a windowed average, a join between two streams, or a leaderboard that updates in real time, you need state. And how a system stores, checkpoints, and recovers that state determines almost everything about how it behaves in production.
This deep dive covers the mechanics of state management in stream processing: what state is, how different types of state are stored, how checkpointing works, and how Apache Flink's RocksDB-based approach compares with RisingWave's disaggregated S3-native architecture. Along the way, you will see real SQL examples verified against RisingWave 2.8.0 that demonstrate stateful operations and understand the operational implications of each design choice.
What Is State in Stream Processing?
State, in the context of stream processing, is any information a processing node must retain across events to produce correct results.
A pure stateless operation needs only the current event: filtering records where amount > 1000, transforming a timestamp format, or extracting a field from JSON. Each event is processed independently, with no memory of what came before.
A stateful operation needs context beyond the current event. Computing the average purchase value per user requires remembering every previous purchase for each user. Detecting whether a user has made three failed logins in five minutes requires tracking timestamps across events for each user. Joining a stream of orders with a stream of shipment updates requires buffering events from both sides until matches arrive.
State management addresses three fundamental questions:
- Where is state stored (local memory, local disk, remote storage)?
- How is state made durable so it survives failures (checkpointing, write-ahead logging)?
- How is state recovered after a failure without reprocessing the entire event history?
The answers to these questions have profound implications for throughput, latency, scalability, cost, and operational burden.
Types of State in Stream Processing
Before comparing specific implementations, it helps to understand the categories of state that any stream processor must handle.
Keyed State
Keyed state (also called partitioned state) associates a data structure with a particular key. In a user analytics pipeline, keyed state might store the running purchase total indexed by user_id. In a fraud detection system, it might store the last 10 transactions indexed by card_id.
Keyed state is the most common category. Whenever you write a GROUP BY user_id aggregation or a keyed join, you are working with keyed state.
Keyed state scales horizontally by distributing keys across processing nodes. A user with ID user_001 always lands on the same node, which means all state updates for that user are colocated and require no cross-node coordination.
Operator State (Broadcast State)
Operator state is not partitioned by key. A single processing node holds the full state. The classic example is a source connector's offset tracking: each partition of a Kafka topic maintains its read offset independently of any key in the data.
Broadcast state is a variant where the same state is replicated to every parallel instance of an operator. This is used for lookup tables or configuration that every operator needs: for example, broadcasting a list of blocked user IDs to every fraud detection operator so each can check incoming events locally without network calls.
Windowed State
Windowed state is keyed state with a time boundary. A tumbling window aggregation computes a sum over a fixed interval (say, every 5 minutes) and then discards the state once the window closes. A sliding window retains state across overlapping intervals.
Windowed state introduces a challenge: state must be associated not just with a key but with a specific time window. The system must know when a window is complete so it can emit results and garbage-collect the associated state.
How Flink Manages State: RocksDB on Local Disks
Apache Flink stores operator state in a state backend. The default production choice has historically been the RocksDB state backend, where state is written to local disk on each TaskManager (Flink's worker node).
The RocksDB State Backend
RocksDB is an embedded key-value store originally developed at Facebook. It uses a log-structured merge (LSM) tree, which means writes go first to an in-memory buffer (MemTable), get flushed to disk as sorted string tables (SSTables), and are periodically compacted into larger files.
In Flink, each parallel operator instance maintains its own embedded RocksDB database. For a word count job running at parallelism 8, there are 8 separate RocksDB instances, each managing a partition of the key space.
The advantages of this design:
- State can exceed available memory because RocksDB spills to disk.
- RocksDB's LSM structure handles high write throughput efficiently.
- State is local, so reads are fast without network round trips.
The disadvantages compound at scale:
- Disk capacity planning: Every TaskManager must have enough local SSD for its share of state. If state grows 10x unexpectedly, you need to rescale the cluster and redistribute state.
- Checkpoint amplification: To make state durable, Flink periodically snapshots the RocksDB state to remote storage (S3, HDFS, or GCS). Incremental checkpoints reduce transfer size, but for terabytes of state, checkpoints can still take minutes and create significant I/O pressure on TaskManagers during the snapshot phase.
- Recovery time: When a TaskManager fails, the replacement must download the full RocksDB state from the checkpoint location and rebuild the local database before it can resume processing. For large-state jobs, this takes minutes to tens of minutes.
- RocksDB tuning: Optimal performance requires tuning dozens of parameters: block cache size, write buffer count, compaction style, compression codec, level ratios. This tuning is specific to the workload and must be revisited when data characteristics change.
Flink Checkpointing Mechanics
Flink uses a variant of the Chandy-Lamport distributed snapshot algorithm. Periodically, the JobManager injects checkpoint barriers into the event stream. These barriers flow through the operator DAG alongside data events.
When a barrier arrives at an operator:
- The operator takes a snapshot of its current state and writes it to the state backend.
- For RocksDB, this means flushing the MemTable and creating a checkpoint of the SSTable files.
- The operator signals to the JobManager that it has completed the checkpoint.
- When all operators confirm, the JobManager records this as a complete checkpoint.
The synchronous phase (flushing in-memory state to disk) must happen while the operator is paused from processing new events. For heavy write workloads, this pause can introduce measurable latency spikes every checkpoint interval.
Recovery restores from the latest complete checkpoint. All operators rewind their Kafka offsets to the position recorded in the checkpoint and replay events from that point. The replayed events update state until the operator reaches the point where it failed, at which point it continues normally.
Flink 2.0 and ForSt: Disaggregating State
Flink 2.0, released in March 2025, introduced ForSt (Flink on RocksDB over S3): a disaggregated state backend that streams state changes directly to remote object storage. ForSt uses a shared storage architecture where SSTable files are written to S3 rather than local disk.
This addresses the most painful operational problems of the classic RocksDB backend:
- No local disk capacity planning: State lives in S3, which scales infinitely.
- Faster recovery: Instead of downloading gigabytes of state, a replacement TaskManager points to the existing S3 files and reads them on demand.
- Reduced checkpoint cost: Checkpointing becomes a metadata update rather than a bulk data transfer.
ForSt is a significant improvement, but it is an opt-in feature layered onto Flink's existing runtime. Production deployments are still validating its behavior at scale, and many teams remain on the classic RocksDB backend while ForSt matures.
How RisingWave Manages State: S3-Native from Day One
RisingWave approaches state management from a completely different starting point. Rather than adding remote storage support to a compute-local architecture, RisingWave was designed from the ground up with object storage as the primary state store.
The Hummock Storage Engine
RisingWave's state storage layer is called Hummock. It is a purpose-built LSM-tree storage engine written in Rust that uses S3 (or compatible object storage: GCS, Azure Blob, or local MinIO) as its primary store.
The key distinction from Flink's RocksDB is that Hummock's LSM tree spans S3 directly. There is no local primary store that gets periodically copied to S3. Local memory and NVMe SSD on compute nodes serve only as a hot cache (the Hummock cache), holding recently accessed state to avoid repeated S3 reads.
This means:
- State size is unlimited: S3 capacity is effectively infinite. A job can accumulate terabytes of state without any disk capacity planning.
- No state migration on scaling: Adding compute nodes does not require redistributing state because the state is not on the nodes. New compute nodes simply start reading from S3.
- Checkpoints are metadata operations: A checkpoint in RisingWave records which SSTable files in S3 constitute a consistent snapshot of the entire pipeline. Since the data is already in S3, checkpointing requires writing only a small metadata file, not transferring gigabytes of state data. Default checkpoint interval is 1 second.
- Recovery is independent of state size: Recovering from a failure means spinning up a new compute node, loading the checkpoint metadata, and starting to read state from the S3 files that were already there. Recovery typically completes in seconds, regardless of how many terabytes of state the pipeline has accumulated.
Architecture: Compute-Storage Separation
RisingWave separates its components into independent scaling units:
- Frontend nodes: Accept SQL connections (PostgreSQL protocol), parse queries, and plan execution.
- Compute nodes: Execute streaming operators and serve batch queries against materialized views.
- Compactor nodes: Run background compaction of Hummock SSTable files, preventing S3 from accumulating too many small files.
- Meta service: Coordinates scheduling, checkpointing, and metadata management.
- Object storage (S3): Stores all state and materialized view data persistently.
Each layer scales independently. If processing throughput is the bottleneck, add compute nodes. If compaction is falling behind, add compactor nodes. Object storage scales automatically. There is no local state to redistribute when nodes are added or removed.
┌──────────────────────────────────────────────────────────┐
│ Frontend Nodes │
│ (PostgreSQL protocol, query planning) │
└─────────────────────────┬────────────────────────────────┘
│
┌─────────────────────────▼────────────────────────────────┐
│ Compute Nodes │
│ (streaming operators, query execution, local cache) │
└──────────┬──────────────────────────────┬────────────────┘
│ │
┌──────────▼──────────┐ ┌─────────────▼───────────────┐
│ Compactor Nodes │ │ Meta Service │
│ (SSTable compaction│ │ (scheduling, checkpointing, │
│ garbage collect.) │ │ cluster coordination) │
└──────────┬──────────┘ └─────────────┬───────────────┘
│ │
┌──────────▼──────────────────────────────▼───────────────┐
│ Object Storage (S3 / GCS / Azure Blob) │
│ (all state, SSTable files, checkpoint metadata) │
└──────────────────────────────────────────────────────────┘
State Management in Practice: SQL Examples
The theoretical differences between RocksDB-local and S3-native state management become concrete when you write streaming SQL. RisingWave exposes state implicitly through materialized views. Every CREATE MATERIALIZED VIEW statement instructs RisingWave to maintain the query results incrementally as new data arrives, persisting the state needed for that computation in Hummock.
The following examples are verified against RisingWave 2.8.0.
Example 1: Keyed State - Per-User Purchase Statistics
This is the canonical keyed state example. For each user, maintain running totals and statistics across all their purchase events.
-- Source table (in production, this would be a Kafka source)
CREATE TABLE state_events (
event_id BIGINT PRIMARY KEY,
user_id VARCHAR,
event_type VARCHAR,
amount DOUBLE PRECISION,
ts TIMESTAMPTZ
);
INSERT INTO state_events VALUES
(1, 'user_001', 'purchase', 120.50, '2024-01-15 10:00:00+00'),
(2, 'user_002', 'purchase', 45.00, '2024-01-15 10:01:00+00'),
(3, 'user_001', 'purchase', 200.00, '2024-01-15 10:05:00+00'),
(4, 'user_003', 'purchase', 89.99, '2024-01-15 10:10:00+00'),
(5, 'user_002', 'purchase', 150.00, '2024-01-15 10:15:00+00'),
(6, 'user_001', 'purchase', 30.00, '2024-01-15 11:00:00+00'),
(7, 'user_003', 'purchase', 420.00, '2024-01-15 11:05:00+00'),
(8, 'user_002', 'purchase', 75.00, '2024-01-15 11:10:00+00'),
(9, 'user_001', 'purchase', 300.00, '2024-01-15 11:30:00+00'),
(10, 'user_003', 'purchase', 60.00, '2024-01-15 11:45:00+00');
-- Materialized view maintains keyed state per user_id
CREATE MATERIALIZED VIEW state_user_purchase_stats AS
SELECT
user_id,
COUNT(*) AS total_purchases,
SUM(amount) AS lifetime_value,
AVG(amount) AS avg_purchase,
MIN(ts) AS first_seen,
MAX(ts) AS last_seen
FROM state_events
GROUP BY user_id;
SELECT user_id, total_purchases, lifetime_value, ROUND(avg_purchase::numeric, 2) AS avg_purchase
FROM state_user_purchase_stats
ORDER BY lifetime_value DESC;
Output:
user_id | total_purchases | lifetime_value | avg_purchase
----------+-----------------+----------------+--------------
user_001 | 4 | 650.5 | 162.63
user_003 | 3 | 569.99 | 190.00
user_002 | 3 | 270 | 90
The state stored for this materialized view is the per-user aggregate: count, sum, and the min/max timestamps for each user_id. When a new event arrives for user_001, RisingWave reads the existing state for user_001 from Hummock (from cache if warm, from S3 if not), updates the aggregates, and writes the updated state back. No full recomputation is needed.
Example 2: Windowed State - Hourly Revenue Aggregation
Windowed aggregations create state that is bounded by time. The window accumulates events within its interval and is discarded once the window closes.
-- Tumbling window: aggregate events into 1-hour buckets
CREATE MATERIALIZED VIEW state_hourly_stats AS
SELECT
window_start,
window_end,
COUNT(*) AS event_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM TUMBLE(state_events, ts, INTERVAL '1 hour')
GROUP BY window_start, window_end;
SELECT window_start, window_end, event_count, total_amount, ROUND(avg_amount::numeric, 2) AS avg_amount
FROM state_hourly_stats
ORDER BY window_start;
Output:
window_start | window_end | event_count | total_amount | avg_amount
---------------------------+---------------------------+-------------+--------------+------------
2024-01-15 10:00:00+00:00 | 2024-01-15 11:00:00+00:00 | 5 | 605.49 | 121.10
2024-01-15 11:00:00+00:00 | 2024-01-15 12:00:00+00:00 | 5 | 885 | 177
The state here consists of the partial aggregates for each open window. Once the event time watermark advances past 11:00:00, the first window is finalized, its results are emitted, and its state is garbage-collected. Hummock's compaction process handles this cleanup asynchronously in the background.
Example 3: Cascading State - Multi-Stage Pipelines
RisingWave supports cascading materialized views, where one materialized view reads from another. Each view maintains its own state layer, enabling complex multi-stage pipelines in pure SQL.
-- Second-stage MV reads from the first-stage MV
-- state_user_purchase_stats is defined above
CREATE MATERIALIZED VIEW state_top_spenders AS
SELECT user_id, lifetime_value, total_purchases, rank
FROM (
SELECT
user_id,
lifetime_value,
total_purchases,
RANK() OVER (PARTITION BY 1::int ORDER BY lifetime_value DESC) AS rank
FROM state_user_purchase_stats
) ranked
WHERE rank <= 3;
SELECT user_id, lifetime_value, total_purchases, rank
FROM state_top_spenders
ORDER BY rank;
Output:
user_id | lifetime_value | total_purchases | rank
----------+----------------+-----------------+------
user_001 | 650.5 | 4 | 1
user_003 | 569.99 | 3 | 2
user_002 | 270 | 3 | 3
Each stage maintains independent state in Hummock. When a new purchase arrives, the update propagates incrementally: the base table state updates, which triggers an incremental update to state_user_purchase_stats, which triggers an incremental update to state_top_spenders. Only changed rows propagate through the chain, not full recomputations.
Checkpointing: What Happens Under the Hood
Flink Checkpoint Flow
In Flink, a checkpoint proceeds as follows:
- The JobManager injects checkpoint barriers into all source partitions simultaneously.
- Barriers flow through the operator DAG with data events.
- When an operator receives barriers from all its input partitions (barrier alignment), it pauses processing, flushes its in-memory RocksDB buffer to disk, and signals checkpoint completion.
- For the RocksDB backend, the checkpoint phase then asynchronously uploads modified SSTable files from local disk to S3 or HDFS.
- When all operators have confirmed, the JobManager records the checkpoint as complete and stores the metadata (source offsets, state file locations) in a persistent location.
Barrier alignment can cause head-of-line blocking: if one input partition delivers its barrier early and another is slow, the operator buffers events from the fast partition while waiting for the slow one. This is a known latency source for jobs with skewed input throughput.
For large state (hundreds of gigabytes of RocksDB data), the asynchronous upload phase can take minutes. During this time, the TaskManager is simultaneously processing new events and uploading state snapshot files, creating I/O contention. Checkpoints can fail if this process takes longer than the configured checkpoint timeout.
RisingWave Checkpoint Flow
RisingWave's checkpoint flow is structurally simpler because state is already in S3:
- The meta service triggers a checkpoint at the configured interval (default: 1 second).
- Each compute node flushes any in-memory state (MemTable) to S3 as new SSTable files.
- The meta service records the current set of SSTable files as a consistent snapshot. This metadata write is the checkpoint record.
- The checkpoint is complete. No multi-megabyte file transfers are required because the data is already where it needs to be.
The checkpoint captures the current committed state across the entire pipeline. Because writes go to S3 first (through Hummock), the checkpoint is always complete and consistent. There is no phase where state exists only on local disk and has not yet been uploaded.
This design enables the 1-second default checkpoint interval. A Flink job with terabytes of state cannot feasibly checkpoint every second because the upload would still be running when the next checkpoint starts. RisingWave's checkpoint at 1 second is a metadata operation that completes in milliseconds regardless of total state size.
Recovery: Where the Differences Are Most Visible
The recovery path after a node failure is where the architectural gap between local-state and disaggregated-state systems becomes most visible in production.
Flink Recovery
- The JobManager detects the failed TaskManager.
- Kubernetes (or another resource manager) allocates a replacement TaskManager.
- The replacement TaskManager downloads the RocksDB state from the last checkpoint in S3 - this is the bottleneck for large-state jobs.
- RocksDB is rebuilt from the downloaded files on local disk.
- The operator rewinds its Kafka offset to the position stored in the checkpoint and begins replaying events.
Step 3 and 4 dominate recovery time for large-state jobs. Downloading and restoring 100 GB of RocksDB state over a typical cloud network takes several minutes. Flink 2.0's ForSt backend eliminates this download step by reading state directly from S3, reducing recovery time to seconds even for large-state jobs - but this is contingent on ForSt being used rather than the classic RocksDB backend.
RisingWave Recovery
- The meta service detects the failed compute node.
- Kubernetes allocates a replacement compute node.
- The replacement node loads the latest checkpoint metadata from the meta service. This tells it which S3 files constitute the current consistent state.
- The replacement node begins processing new events. State is read from S3 on demand (cache misses) and cached locally. The Hummock cache warms up as queries arrive.
There is no bulk state download. Recovery time is dominated by node startup and Hummock cache warming, not by state transfer. This is consistently in the seconds range regardless of state size - whether the pipeline has 10 GB or 10 TB of accumulated state.
Operational Implications
Disk Management
With Flink's RocksDB backend, every TaskManager needs local disk capacity proportional to its share of state. For a 1 TB state job with 10 TaskManagers, each needs at least 100 GB of fast local SSD - and ideally more to handle compaction, temporary files, and growth headroom. When state grows unexpectedly, you may need to rescale the cluster and redistribute key partitions, which typically requires a job restart.
RisingWave compute nodes need no local disk for persistent state. The Hummock cache uses local NVMe for hot data, but it is a bounded cache, not a primary store. Cache misses result in S3 reads rather than job failure.
Scaling
Scaling a Flink job involves stopping the job, taking a savepoint (a full snapshot of all state), adjusting parallelism, and restarting from the savepoint. For large-state jobs, saving and restoring from a savepoint can take tens of minutes. Flink 2.0 improves this for ForSt users, but the process still involves a job restart.
RisingWave compute nodes can be added or removed without restoring state. The state is in S3 and is not owned by any specific compute node. Adding a compute node reshards processing load across more nodes incrementally.
Cost Structure
The cost difference comes from storage. Local SSD on cloud VMs (GP3, io2, or local NVMe) costs roughly $0.08-0.20 per GB per month. Amazon S3 costs $0.023 per GB per month. For a pipeline with 1 TB of state, the difference is $80-200/month for local SSD versus $23/month for S3 - roughly a 3-8x storage cost multiplier for local-state systems, not counting the duplicate cost of checkpoint storage in S3 on top of the local storage.
For very large state (10+ TB), this difference compounds significantly and is often the primary cost driver after compute.
Debugging and Observability
Flink's state can be inspected using the Flink web UI's task metrics and via the state backend's file layout on local disk. Diagnosing issues with RocksDB state (compaction lag, write stalls, cache pressure) requires familiarity with RocksDB internals.
RisingWave's state is observable through standard SQL. You can query any materialized view directly, inspect its current state, and understand what the pipeline has computed. The RisingWave monitoring guide covers Prometheus metrics for Hummock (SSTable counts, cache hit rates, compaction throughput) that map directly to familiar storage concepts without requiring RocksDB expertise.
Side-by-Side Comparison
| Dimension | Flink (RocksDB) | Flink (ForSt, 2.0+) | RisingWave (Hummock) |
| Primary state store | Local disk (RocksDB) | S3 (SSTable files) | S3 (SSTable files) |
| Local cache | RocksDB block cache | SSTable cache | Hummock cache (mem + NVMe) |
| Checkpoint mechanism | Barrier-based; uploads state to S3 | Barrier-based; state already in S3 | Epoch-based; metadata write only |
| Default checkpoint interval | 30s-5min (state-size dependent) | 30s-5min | 1 second |
| Checkpoint cost at 1 TB state | Minutes (upload delta files) | Seconds (metadata update) | Milliseconds (metadata update) |
| Recovery time at 1 TB state | 5-30 minutes (download + rebuild) | Seconds (read S3 on demand) | Seconds (read S3 on demand) |
| Disk capacity planning | Required per node | Not required | Not required |
| RocksDB tuning required | Yes | No | No |
| Scale-out requires restart | Yes (savepoint cycle) | Yes (savepoint cycle) | No |
| State size limit | Local disk capacity | S3 capacity | S3 capacity |
| Storage cost (1 TB state) | $80-200/month (SSD) + S3 for checkpoints | $23/month (S3 only) | $23/month (S3 only) |
| Production maturity | Very mature | New (Flink 2.0, 2025) | Production since 2022 |
Choosing the Right Architecture for Your Workload
When Local-State (RocksDB) Makes Sense
If you are running Flink with the classic RocksDB backend, local-state is the right choice when:
- State is small enough to fit comfortably on local disk with headroom (under ~50 GB per node).
- You need the DataStream API or CEP (complex event processing) with MATCH_RECOGNIZE.
- Your team has existing RocksDB expertise and Flink operations tooling.
- Checkpoint latency is acceptable at your state size and throughput.
When Disaggregated State (RisingWave / ForSt) Makes Sense
Disaggregated state is the better choice when:
- State is large or unpredictable. If you cannot accurately predict state growth, being bounded by local disk is a risk you do not want.
- Recovery time matters. Sub-minute recovery regardless of state size is an architectural guarantee, not a tuning exercise.
- Your team prefers SQL over Java/Scala. RisingWave exposes state implicitly through materialized views, eliminating the need to reason about state backends at all.
- Operational simplicity is a priority. No RocksDB tuning, no checkpoint size management, no savepoint cycles for scaling.
- You need built-in serving. RisingWave's materialized views are queryable via the PostgreSQL protocol, eliminating the need for a separate serving database like Redis or PostgreSQL downstream.
For teams building new streaming systems in 2026, the default should be disaggregated state. It removes the most common operational pain points without sacrificing correctness or performance. For teams with existing Flink investments, migrating to RisingWave or upgrading to Flink 2.0 with ForSt are both viable paths toward disaggregated state.
FAQ
What is state management in stream processing?
State management in stream processing refers to how a system stores, updates, and recovers the intermediate data structures needed to compute results over continuous event streams. Without state, a stream processor can only perform stateless transformations. With state, it can compute running aggregates, join streams over time, detect patterns across multiple events, and maintain leaderboards - all of which require remembering information across events.
How does Flink's RocksDB state backend work?
Flink's RocksDB state backend stores operator state on local disk using RocksDB, an embedded LSM-tree key-value store. Each parallel operator instance maintains its own local RocksDB database. State is made durable through periodic checkpoints that snapshot the RocksDB files and upload them to remote storage (S3 or HDFS). Recovery involves downloading the state from the checkpoint and rebuilding the local RocksDB database before resuming processing. Flink 2.0 introduced ForSt, a new backend that stores state directly in S3, eliminating the local disk dependency.
What is disaggregated state management and why does it matter?
Disaggregated state management means the state storage layer is separated from the compute layer. State is stored in a remote, scalable store (like S3) rather than on local disks attached to compute nodes. This matters for three reasons: (1) state size is limited only by object storage capacity, not local disk; (2) compute nodes can be added or removed without redistributing state; (3) recovery after failure is fast regardless of state size, because new nodes read state from S3 rather than downloading it from a checkpoint. RisingWave's Hummock engine and Flink 2.0's ForSt backend are both examples of disaggregated state management.
How does RisingWave handle state in materialized views?
RisingWave manages state implicitly through its Hummock storage engine. When you create a materialized view with CREATE MATERIALIZED VIEW, RisingWave automatically determines what state is needed (aggregate values, join buffers, window accumulators) and stores it as SSTable files in object storage (S3 or compatible). As new events arrive, only the affected state entries are updated, not the full result set. You never interact with state directly - you write SQL and RisingWave handles the rest. See the RisingWave materialized view documentation for the full syntax reference.
Conclusion
State management is not a detail - it is the core of what makes stream processing hard and what differentiates stream processing systems from each other.
The key takeaways from this deep dive:
- State in stream processing covers keyed state, operator state, and windowed state. Most SQL-based streaming workloads use keyed and windowed state.
- Flink's RocksDB backend is battle-tested and efficient for moderate state sizes, but couples state to local disks in ways that create operational complexity at scale.
- Flink 2.0's ForSt backend moves state to S3, solving the core scaling problems - but it is a new feature layered onto a mature runtime.
- RisingWave's Hummock engine stores all state in S3 from the start, enabling 1-second checkpoint intervals, second-level recovery regardless of state size, and compute scaling without state redistribution.
- The practical cost difference for large-state workloads is 3-8x on storage alone, because S3 is dramatically cheaper than provisioned SSD.
- For teams that work in SQL, RisingWave exposes state implicitly through materialized views, removing the need to configure, tune, or reason about state backends at all.
Ready to try RisingWave yourself? Get started in 5 minutes. Quickstart
Join our Slack community to ask questions and connect with other stream processing developers.

