Join our Streaming Lakehouse Tour!
Register Now.->

State Store

The State Store in RisingWave is a crucial internal component responsible for persistently managing and providing access to the state required for all stateful stream processing operations. This includes the intermediate results of aggregations, the data held for joins between streams, the current values in materialized views, and other internal states needed for continuous query execution and fault tolerance.

RisingWave's State Store is uniquely designed to be cloud-native, scalable, and resilient, primarily leveraging the Hummock storage engine.

Core Purpose and Responsibilities

  • Persistent State Management: Durably stores all data that needs to be maintained across events and over time for streaming computations. This ensures that if a compute node fails, the state is not lost.
  • Efficient State Access: Provides low-latency read and write access to state for the Compute Nodes executing streaming operators. This is critical for high-performance stream processing.
  • Scalability: Designed to handle large volumes of state, scaling independently from compute resources. This is achieved by its disaggregated architecture, with Hummock typically using cloud object storage (like AWS S3) as its primary persistence layer.
  • Fault Tolerance and Recovery: Underpins RisingWave's fault tolerance. Through mechanisms like checkpointing, the State Store allows the system to recover from failures by restoring a consistent version of the state and resuming processing.
  • Versioning and Consistency: Manages different versions of state (epochs) to ensure data consistency, especially during recovery and for features like time travel (though time travel on materialized views is a future concept).
  • Supporting Materialized Views: The data that backs RisingWave's Materialized Views is held and managed within the State Store, allowing these views to be updated incrementally and queried with low latency.

Key Components and Architecture (Hummock)

The State Store in RisingWave is primarily embodied by Hummock, its purpose-built distributed Log-Structured Merge-tree (LSM-tree) based storage engine:

  • Cloud Object Storage Backend: Hummock is designed to use cloud object stores (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage) as its durable, scalable, and cost-effective backend for storing state data (SSTables).
  • Meta Service: A Meta service (part of RisingWave's control plane) manages metadata about the state stored in Hummock, including version information, compaction status, and locations of data files.
  • Compute Node Interaction: Compute Nodes interact with Hummock to read the state they need for processing and to write updates to the state. They may cache frequently accessed state locally for performance.
  • Compaction: Hummock performs background compaction operations to merge and organize data in object storage, optimizing read performance and managing storage space efficiently.
  • Checkpointing: Compute Nodes periodically flush their local state changes, which are then committed as part of a consistent global checkpoint coordinated by the Meta service and persisted by Hummock.

Why is it Important for RisingWave?

  • Enables Stateful Streaming: Without a robust State Store, complex stateful operations like windowed aggregations over long periods or joins between high-volume streams would not be feasible or reliable.
  • Foundation for Incremental Computation: The ability to efficiently read and update specific parts of the state is key to RisingWave's incremental computation model for materialized views.
  • Cloud-Native Design: The State Store's (Hummock's) architecture, leveraging object storage, aligns with cloud-native principles of disaggregation, scalability, and elasticity, making RisingWave well-suited for cloud deployments.
  • Resilience: Ensures that the valuable state computed by streaming jobs is not lost during failures, allowing for consistent and reliable real-time analytics.

In essence, the State Store, powered by Hummock, is the backbone of RisingWave's ability to perform complex, stateful stream processing reliably and at scale in a cloud environment.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.