Cost Optimization for Streaming Workloads: S3, Compute, and Right-Sizing
Monitoring stream processing systems requires tracking throughput (events/sec), latency (processing delay), state size (memory/disk usage), checkpoint duration, and consumer lag (how far behind the source). These metrics determine whether your streaming pipeline is healthy.
Key Metrics
| Metric | What to Monitor | Alert When |
| Throughput | Events processed per second | Drops >50% below baseline |
| Latency | End-to-end processing time | Exceeds SLA (e.g., >1 second) |
| State size | Total state across operators | Growing unbounded |
| Checkpoint duration | Time to complete checkpoint | Exceeds checkpoint interval |
| Consumer lag | Events behind source | Growing continuously |
| Backpressure | Operators slowing upstream | Any sustained backpressure |
Grafana Dashboard for RisingWave
Connect Grafana to RisingWave via PostgreSQL:
-- Pipeline health dashboard query
SELECT source_name, COUNT(*) as events_5min,
MAX(ts) as last_event, NOW()-MAX(ts) as staleness
FROM pipeline_events WHERE ts > NOW()-INTERVAL '5 minutes'
GROUP BY source_name;
Frequently Asked Questions
What is backpressure in stream processing?
Backpressure occurs when a downstream operator can't keep up with the rate of incoming events. It propagates upstream, slowing the entire pipeline. Causes: insufficient compute, expensive operations, or sudden traffic spikes.
How do I monitor RisingWave?
RisingWave exposes Prometheus metrics and supports Grafana dashboards. Additionally, create monitoring materialized views that track your own pipeline health metrics.

