Cost Optimization for Streaming Workloads: S3, Compute, and Right-Sizing

Monitoring stream processing systems requires tracking throughput (events/sec), latency (processing delay), state size (memory/disk usage), checkpoint duration, and consumer lag (how far behind the source). These metrics determine whether your streaming pipeline is healthy.

Key Metrics

Metric	What to Monitor	Alert When
Throughput	Events processed per second	Drops >50% below baseline
Latency	End-to-end processing time	Exceeds SLA (e.g., >1 second)
State size	Total state across operators	Growing unbounded
Checkpoint duration	Time to complete checkpoint	Exceeds checkpoint interval
Consumer lag	Events behind source	Growing continuously
Backpressure	Operators slowing upstream	Any sustained backpressure

Grafana Dashboard for RisingWave

Connect Grafana to RisingWave via PostgreSQL:

-- Pipeline health dashboard query
SELECT source_name, COUNT(*) as events_5min,
  MAX(ts) as last_event, NOW()-MAX(ts) as staleness
FROM pipeline_events WHERE ts > NOW()-INTERVAL '5 minutes'
GROUP BY source_name;

Frequently Asked Questions

What is backpressure in stream processing?

Backpressure occurs when a downstream operator can't keep up with the rate of incoming events. It propagates upstream, slowing the entire pipeline. Causes: insufficient compute, expensive operations, or sudden traffic spikes.

How do I monitor RisingWave?

RisingWave exposes Prometheus metrics and supports Grafana dashboards. Additionally, create monitoring materialized views that track your own pipeline health metrics.

Monitoring Stream Processing: Metrics, Dashboards, and Alerting