How to Debug Streaming SQL Queries

How to Debug Streaming SQL Queries

How to Debug Streaming SQL Queries

Debugging streaming SQL is different from debugging batch SQL — you can't just re-run the query. Common issues include watermark problems, join state growth, incorrect aggregations, and performance bottlenecks. Here's a systematic approach to diagnosing and fixing streaming query problems.

Common Streaming SQL Issues

SymptomLikely CauseFix
No outputWatermark too aggressiveIncrease watermark delay
Stale resultsSource not producingCheck source connectivity
Memory growingUnbounded state (missing time bound)Add WHERE ts > NOW() - INTERVAL
Slow queriesToo many materialized viewsConsolidate or use cascading MVs
Wrong countsDuplicate eventsAdd deduplication logic

Debugging Steps

1. Check Source Health

-- Are events arriving?
SELECT COUNT(*) FROM my_source WHERE ts > NOW() - INTERVAL '5 minutes';

2. Check Materialized View State

-- Is the view producing output?
SELECT * FROM my_materialized_view LIMIT 10;

3. Check for Unbounded State

-- This will grow forever (BAD):
CREATE MATERIALIZED VIEW bad AS SELECT user_id, COUNT(*) FROM events GROUP BY user_id;

-- This is bounded (GOOD):
CREATE MATERIALIZED VIEW good AS SELECT user_id, COUNT(*) FROM events
WHERE ts > NOW() - INTERVAL '24 hours' GROUP BY user_id;

4. Monitor System Metrics

RisingWave exposes Prometheus metrics for barrier latency, throughput, and memory usage.

Frequently Asked Questions

Why is my streaming query producing no output?

Most common cause: watermark is too aggressive (events arrive after the watermark has passed). Increase the watermark delay. Also check that the source is actually receiving data.

How do I prevent memory growth in streaming queries?

Always add time bounds to aggregations: WHERE ts > NOW() - INTERVAL '24 hours'. Without bounds, state grows indefinitely as new distinct keys appear.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.