High Availability in Stream Processing: Active-Active and Failover Patterns

High Availability in Stream Processing: Active-Active and Failover Patterns

Streaming Data Governance: Schema Registry, Lineage, and Access Control

High availability (HA) in stream processing ensures continuous operation during node failures, network partitions, and planned maintenance. The key patterns are active-standby (failover), active-active (parallel processing), and cross-region (disaster recovery).

HA Patterns

PatternDowntimeData LossCostComplexity
Active-standbySeconds (failover)Checkpoint interval2x computeLow
Active-activeZeroNear-zero2x+ computeHigh
Cross-regionMinutesCross-region lag2x+ everythingHigh

RisingWave HA

RisingWave's disaggregated architecture provides built-in HA:

  • State on S3: 11 nines durability. No state loss on node failure.
  • 1-second checkpoints: Maximum 1 second of data loss.
  • Seconds-level recovery: New nodes read state from S3 immediately.
  • Elastic scaling: Add/remove compute without state migration.

Frequently Asked Questions

What is the fastest recovery time for streaming systems?

RisingWave and Flink 2.0 (ForSt) recover in seconds regardless of state size due to disaggregated S3 state. Kafka Streams with standby replicas also recovers in seconds but requires 2x resources.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.