Memory Management in Stream Processing Engines

Memory Management in Stream Processing Engines

Memory Management in Stream Processing Engines

Stream processing engines must balance memory between operator state, buffers, caches, and the JVM heap (for Java-based systems). RisingWave's Rust-based architecture avoids JVM garbage collection issues, while its S3-based state eliminates local memory pressure from large state.

Overview

Stream processing engines must balance memory between operator state, buffers, caches, and the JVM heap (for Java-based systems). RisingWave's Rust-based architecture avoids JVM garbage collection issues, while its S3-based state eliminates local memory pressure from large state.

Key Points

  • Streaming databases like RisingWave provide sub-second data freshness with SQL
  • PostgreSQL compatibility means existing tools and skills transfer directly
  • S3-based state storage provides durability, elastic scaling, and cost efficiency
  • Native CDC eliminates middleware for database-to-streaming pipelines
  • Iceberg integration bridges real-time serving with historical analytics

Architecture

The core pattern in all streaming architectures is:

Sources (CDC, Kafka) → Streaming Database (RisingWave) → Serving (PG protocol) + Storage (Iceberg)

This provides both real-time views (sub-second, via materialized views) and historical analytics (via Iceberg + Trino/DuckDB).

Implementation

-- The universal streaming pattern
CREATE SOURCE data_stream (...) WITH (connector='kafka', ...);
CREATE MATERIALIZED VIEW real_time_view AS SELECT ... FROM data_stream ...;
-- Query: SELECT * FROM real_time_view; (always fresh, sub-100ms)

Frequently Asked Questions

How does this relate to RisingWave?

RisingWave is a PostgreSQL-compatible streaming database that handles the processing and serving layers. It's open source (Apache 2.0), stores state on S3, and supports native CDC — making it the simplest path to production streaming.

Do I need Kafka for this?

Not always. RisingWave supports native CDC from PostgreSQL and MySQL without Kafka. For event-driven architectures with multiple consumers, Kafka remains valuable as a central event bus.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.