You wrote a streaming SQL pipeline. It runs. But your materialized view is stale, aggregation numbers look wrong, or throughput collapsed after a spike. Now what?
Debugging batch SQL is straightforward: run the query, inspect the result, check the execution plan. Streaming SQL is different. The query runs continuously, state accumulates over time, and the problem may have happened 20 minutes ago in a fragment you cannot see directly. Most teams default to restarting the pipeline and hoping the issue doesn't recur. That approach does not scale in production.
This guide covers concrete techniques to debug streaming SQL queries in production using RisingWave. You will learn how to read streaming execution plans with EXPLAIN, monitor materialized view lag, identify backpressure using system tables, trace late data through the pipeline, and understand the diagnostic information RisingWave exposes without reaching for external tools.
All SQL in this guide is verified against RisingWave 2.8.0 (PostgreSQL 13.14.0-RisingWave-2.8.0). You can run every query against your own instance.
Setting Up a Debugging Environment
Before diving into techniques, create a small set of tables to work with throughout this guide. Using the prefix dbg_ keeps them isolated from production objects.
CREATE TABLE dbg_orders (
order_id BIGINT PRIMARY KEY,
user_id BIGINT,
product_id BIGINT,
amount NUMERIC,
region VARCHAR,
status VARCHAR,
event_time TIMESTAMPTZ
);
CREATE TABLE dbg_products (
product_id BIGINT PRIMARY KEY,
name VARCHAR,
category VARCHAR,
price NUMERIC
);
Insert some sample data:
INSERT INTO dbg_products VALUES
(1, 'Wireless Headphones', 'Electronics', 89.99),
(2, 'Coffee Maker', 'Appliances', 49.99),
(3, 'Running Shoes', 'Sports', 129.99),
(4, 'Yoga Mat', 'Sports', 29.99),
(5, 'Smart Watch', 'Electronics',299.99);
INSERT INTO dbg_orders VALUES
(1001, 1, 1, 89.99, 'us-east', 'completed', NOW() - INTERVAL '10 minutes'),
(1002, 2, 3, 129.99, 'us-west', 'completed', NOW() - INTERVAL '8 minutes'),
(1003, 1, 5, 299.99, 'eu-west', 'pending', NOW() - INTERVAL '6 minutes'),
(1004, 3, 2, 49.99, 'us-east', 'completed', NOW() - INTERVAL '4 minutes'),
(1005, 4, 4, 29.99, 'ap-east', 'completed', NOW() - INTERVAL '2 minutes'),
(1006, 2, 1, 89.99, 'us-west', 'completed', NOW() - INTERVAL '1 minute'),
(1007, 5, 3, 129.99, 'eu-west', 'pending', NOW() - INTERVAL '30 seconds');
Now create two materialized views:
CREATE MATERIALIZED VIEW dbg_order_stats AS
SELECT
region,
status,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount,
MAX(event_time) AS latest_order_time
FROM dbg_orders
GROUP BY region, status;
CREATE MATERIALIZED VIEW dbg_order_hourly_revenue AS
SELECT
DATE_TRUNC('hour', event_time) AS hour_bucket,
region,
SUM(amount) AS revenue,
COUNT(*) AS order_count
FROM dbg_orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('hour', event_time), region;
Query dbg_order_stats to confirm it is populated:
region | status | order_count | total_amount | avg_amount | latest_order_time
---------+-----------+-------------+--------------+------------+-------------------------------
ap-east | completed | 1 | 29.99 | 29.99 | 2026-04-02 07:26:13.127+00:00
eu-west | pending | 2 | 429.98 | 214.99 | 2026-04-02 07:27:43.127+00:00
us-east | completed | 2 | 139.98 | 69.99 | 2026-04-02 07:24:13.127+00:00
us-west | completed | 2 | 219.98 | 109.99 | 2026-04-02 07:27:13.127+00:00
With these objects in place, every debugging technique below has a real target to work against.
Using EXPLAIN to Read Streaming Query Plans
EXPLAIN is your first tool when a streaming query behaves unexpectedly. In RisingWave, EXPLAIN on a CREATE MATERIALIZED VIEW statement shows the streaming execution plan before you commit to creating the view. This lets you catch performance problems before they reach production.
Reading the Streaming Plan Operators
Every RisingWave streaming plan is a tree of operators. The key operators you will encounter are:
| Operator | What it does |
StreamTableScan | Reads from a source table or another MV |
StreamFilter | Applies a WHERE predicate |
StreamProject | Computes expressions (like DATE_TRUNC) |
StreamHashAgg | Aggregates by grouping key using a hash table |
StreamHashJoin | Joins two streams by matching on a key |
StreamExchange | Shuffles data across compute nodes by a distribution key |
StreamMaterialize | Writes the final result into the MV's state store |
Run EXPLAIN on a query that aggregates by hour and region:
EXPLAIN CREATE MATERIALIZED VIEW dbg_order_hourly_revenue AS
SELECT
DATE_TRUNC('hour', event_time) AS hour_bucket,
region,
SUM(amount) AS revenue,
COUNT(*) AS order_count
FROM dbg_orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('hour', event_time), region;
Output:
StreamMaterialize { columns: [hour_bucket, region, revenue, order_count],
stream_key: [hour_bucket, region], pk_conflict: NoCheck }
└─StreamHashAgg { group_key: [$expr1, dbg_orders.region], aggs: [sum(amount), count] }
└─StreamExchange { dist: HashShard($expr1, dbg_orders.region) }
└─StreamProject { exprs: [DateTrunc('hour', event_time, 'UTC') as $expr1, region, amount, order_id] }
└─StreamFilter { predicate: (dbg_orders.status = 'completed') }
└─StreamTableScan { table: dbg_orders, columns: [amount, region, event_time, order_id, status] }
Read the tree bottom-up: data flows from StreamTableScan upward through the filter, projection, exchange (shuffle), aggregation, and finally into the materialized state.
What to Look For in the Plan
StreamExchange placement. A StreamExchange before an aggregation is expected: RisingWave needs to route all rows with the same group key to the same compute node. If you see two StreamExchange nodes flanking a StreamHashJoin, that means both sides of the join are being shuffled, which is more expensive but usually unavoidable.
Missing filter pushdown. If your WHERE clause appears after a join in the plan tree rather than before it, RisingWave is doing more work than needed. Sometimes restructuring the query or using a subquery resolves this.
Unexpected full scans. A StreamTableScan on a large table without a targeted filter means every new record from the source triggers computation across the full upstream scan history.
EXPLAIN on Joins
Joins are where streaming plans become interesting. Use EXPLAIN to see how RisingWave routes data before joining:
EXPLAIN CREATE MATERIALIZED VIEW dbg_enriched_orders AS
SELECT
o.order_id,
o.region,
o.amount,
o.event_time,
p.name AS product_name,
p.category
FROM dbg_orders AS o
JOIN dbg_products AS p ON o.product_id = p.product_id;
Output:
StreamMaterialize { columns: [order_id, region, amount, event_time, product_name, category,
dbg_orders.product_id(hidden), dbg_products.product_id(hidden)],
stream_key: [order_id, dbg_orders.product_id], pk_conflict: NoCheck }
└─StreamExchange { dist: HashShard(dbg_orders.order_id, dbg_orders.product_id) }
└─StreamHashJoin { type: Inner, predicate: dbg_orders.product_id = dbg_products.product_id }
├─StreamExchange { dist: HashShard(dbg_orders.product_id) }
│ └─StreamTableScan { table: dbg_orders, columns: [order_id, product_id, amount, region, event_time] }
└─StreamExchange { dist: HashShard(dbg_products.product_id) }
└─StreamTableScan { table: dbg_products, columns: [product_id, name, category] }
The (hidden) columns in StreamMaterialize are the primary key columns that RisingWave needs to efficiently update existing rows in the view. They are not returned in query results. When you see more hidden columns than expected, it usually means RisingWave could not infer a minimal primary key from the join output, and is including extra columns to guarantee uniqueness.
For more on how RisingWave's execution model works, see the architecture overview in the docs.
Monitoring Materialized View Lag
A materialized view in RisingWave is incrementally maintained: as new data arrives at source tables, the view updates within seconds. "Lag" is the gap between the timestamp of the latest event processed by the view and the current wall clock time. High lag means the view is behind, and queries against it return stale results.
Measuring Lag with a Timestamp Column
If your source table has an event timestamp, you can directly measure how fresh the view is:
SELECT
MAX(latest_order_time) AS latest_processed_event,
NOW() AS current_time,
EXTRACT(EPOCH FROM (NOW() - MAX(latest_order_time))) AS lag_seconds
FROM dbg_order_stats;
Output (run shortly after inserting data):
latest_processed_event | current_time | lag_seconds
-------------------------------+-------------------------------+------------
2026-04-02 07:31:30.285+00:00 | 2026-04-02 07:32:06.259+00:00 | 35.974000
A lag of 35 seconds on a freshly populated table is typical when the system is under load from other backfill jobs. In a stable production system processing real-time data, you expect this number to stay under 5 seconds.
Querying rw_materialized_views
The rw_catalog.rw_materialized_views system table records when each MV was initialized and when it was fully created:
SELECT id, name, status, initialized_at, created_at
FROM rw_catalog.rw_materialized_views
WHERE name LIKE 'dbg_%'
ORDER BY id;
Output:
id | name | status | initialized_at | created_at
-----+--------------------------+---------+---------------------------+---------------------------
842 | dbg_order_stats | Created | 2026-04-02 07:28:16+00:00 | 2026-04-02 07:28:43+00:00
929 | dbg_order_hourly_revenue | Created | 2026-04-02 07:28:54+00:00 | 2026-04-02 07:30:39+00:00
The gap between initialized_at (when the MV started backfilling) and created_at (when it finished and became queryable) tells you how long the initial backfill took. A large gap means the upstream table had a lot of historical data to process.
Checking Storage Size
rw_catalog.rw_table_stats stores key and value sizes for each relation. Join it with rw_materialized_views to get a quick storage snapshot:
SELECT
mv.name AS mv_name,
ts.id,
ts.total_key_count,
ts.total_key_size + ts.total_value_size AS storage_bytes
FROM rw_catalog.rw_table_stats ts
JOIN rw_catalog.rw_materialized_views mv ON mv.id = ts.id
WHERE mv.name LIKE 'dbg_%'
ORDER BY mv.name;
Output:
mv_name | id | total_key_count | storage_bytes
--------------------------+-----+-----------------+--------------
dbg_order_hourly_revenue | 929 | 5 | 410
dbg_order_stats | 842 | 6 | 733
A rapidly growing storage_bytes on an aggregation MV sometimes indicates a query design problem: for example, a group-by on a high-cardinality column like user_id paired with a non-windowed aggregation keeps one state entry per unique user forever.
Checking Backfill Progress with SHOW JOBS
When you create a new materialized view on a large existing table, it backfills historical data. Use SHOW JOBS to track the progress:
SHOW JOBS;
Example output (simplified):
Id | Statement | Create Type | Progress
-----+--------------------------------------------------------------+-------------+------------------------
930 | CREATE MATERIALIZED VIEW llm_mv_support_context AS ... | FOREGROUND | Snapshot [0.00% (0/4)]
931 | CREATE MATERIALIZED VIEW emb_product_features AS ... | FOREGROUND | 0.0%
998 | CREATE MATERIALIZED VIEW arroyo_regional_revenue AS ... | FOREGROUND | Snapshot [0.00% (0/5000)]
The Progress column shows two phases: Snapshot [X% (processed/total)] is the historical backfill phase, and after that the view enters streaming mode. Until SHOW JOBS returns no rows for your MV, the view is not yet fully caught up.
You can also query rw_catalog.rw_ddl_progress for the same information:
SELECT ddl_id, progress, backfill_type, initialized_at
FROM rw_catalog.rw_ddl_progress
ORDER BY initialized_at DESC;
Identifying Backpressure
Backpressure in RisingWave means a downstream operator cannot keep up with its upstream, so the upstream slows down. When backpressure is severe, the entire pipeline slows, barrier latency rises, and your materialized views fall behind.
The Barrier Latency Signal
RisingWave uses periodic barrier messages to checkpoint state. Barriers travel from the meta node through all compute actors and back. When a slow operator blocks a barrier from passing through, barrier latency increases. This is the single most important health metric in RisingWave.
RisingWave exposes barrier latency in Prometheus metrics under stream_barrier_inflight_counts and meta_barrier_duration_seconds. In the absence of Prometheus, you can infer backpressure by watching how long DDL statements take to complete:
NOTICE: CREATE TABLE has taken more than 30 secs, likely due to high barrier latency.
See: https://docs.risingwave.com/performance/metrics#barrier-monitoring for steps to diagnose high barrier latency.
RisingWave prints this notice automatically when a DDL command waits longer than 30 seconds for barrier completion. If you see this in production logs, the system is under backpressure.
Querying rw_streaming_jobs
The rw_catalog.rw_streaming_jobs table lists all streaming jobs (tables, materialized views, sinks) and their current status:
SELECT id, name, status, parallelism
FROM rw_catalog.rw_streaming_jobs
WHERE name LIKE 'dbg_%'
ORDER BY id;
Output:
id | name | status | parallelism
-----+--------------------------+---------+-------------
692 | dbg_orders | CREATED | ADAPTIVE
775 | dbg_products | CREATED | ADAPTIVE
842 | dbg_order_stats | CREATED | ADAPTIVE
929 | dbg_order_hourly_revenue | CREATED | ADAPTIVE
A status of CREATED is healthy. The INITIAL status means a job is still initializing or backfilling and has not yet entered full streaming mode.
Checking Fragment Distribution
Each streaming job is divided into fragments, and each fragment runs as parallel actors on compute nodes. View the fragment structure for your jobs:
SELECT f.fragment_id, f.table_id, f.distribution_type, f.flags, f.parallelism
FROM rw_catalog.rw_fragments f
JOIN rw_catalog.rw_streaming_jobs j ON j.id = f.table_id
WHERE j.name LIKE 'dbg_%'
ORDER BY j.id, f.fragment_id;
Output:
fragment_id | table_id | distribution_type | flags | parallelism
-------------+----------+-------------------+---------------------------------------------+-------------
527 | 692 | HASH | {MVIEW,UPSTREAM_SINK_UNION} | 10
528 | 692 | HASH | {SOURCE,DML} | 10
640 | 842 | HASH | {MVIEW} | 10
641 | 842 | HASH | {STREAM_SCAN,SNAPSHOT_BACKFILL_STREAM_SCAN} | 10
701 | 929 | HASH | {MVIEW} | 10
702 | 929 | HASH | {STREAM_SCAN,SNAPSHOT_BACKFILL_STREAM_SCAN} | 10
Each fragment has parallelism = 10, meaning 10 actors handle it in parallel. If one fragment has significantly lower parallelism than others in the same pipeline, that fragment is the bottleneck. You can adjust parallelism with ALTER MATERIALIZED VIEW ... SET PARALLELISM.
Applying a Rate Limit to Reduce Pressure
When an upstream Kafka source or CDC stream is overwhelming your pipeline during a spike, you can apply a rate limit to the DML path of a table to smooth ingestion:
-- Limit DML writes to this table to 500 rows per second
ALTER TABLE dbg_orders SET DML_RATE_LIMIT = 500;
-- Reset to unlimited
ALTER TABLE dbg_orders SET DML_RATE_LIMIT = DEFAULT;
For source-connector-backed tables (Kafka, CDC), use SOURCE_RATE_LIMIT instead. Check the current rate limits on all fragments:
SELECT fragment_id, fragment_type, table_id, rate_limit
FROM rw_catalog.rw_rate_limit;
An empty result means no rate limits are active. When a rate limit is set, the corresponding fragment appears here with the configured rows-per-second value.
Debugging Late Data Issues
Late data is a record that arrives after its event timestamp has already passed the watermark. In RisingWave, watermarks on event-time windows determine when windows close and emit results. A record arriving after its window has closed may either be silently dropped or included in a past window, depending on your query design.
Reproducing the Late Data Scenario
Insert a completed order with an event time 5 hours in the past:
INSERT INTO dbg_orders VALUES
(2001, 10, 2, 49.99, 'us-east', 'completed', NOW() - INTERVAL '5 hours');
Then query dbg_order_hourly_revenue, which uses DATE_TRUNC('hour', event_time):
SELECT * FROM dbg_order_hourly_revenue ORDER BY hour_bucket, region;
After the MV processes the insert, the late record appears in its own historical bucket:
hour_bucket | region | revenue | order_count
---------------------------+---------+---------+-------------
2026-04-02 02:00:00+00:00 | us-east | 49.99 | 1 <- late record
2026-04-02 07:00:00+00:00 | ap-east | 29.99 | 1
2026-04-02 07:00:00+00:00 | us-east | 139.98 | 2
2026-04-02 07:00:00+00:00 | us-west | 219.98 | 2
This behavior is correct for DATE_TRUNC-based grouping: RisingWave includes late records in their correct historical window without watermarks. If your query uses an explicit watermark (WATERMARK FOR event_time AS event_time - INTERVAL '5 minutes'), late records arriving after the watermark advance are discarded.
Late Data Diagnosis Checklist
Check whether the timestamp column is being used as the watermark. If you used WATERMARK FOR event_time in your source table definition, records older than the watermark threshold are dropped at ingestion, not at the MV. You can verify this by querying the raw table:
SELECT COUNT(*) FROM dbg_orders WHERE event_time < NOW() - INTERVAL '1 hour';
If this count is higher than you expect given your watermark configuration, records are not being dropped and late data is included.
Compare raw table counts to MV counts. An MV that aggregates with a WHERE clause on status or event_time may exclude late records if they fall outside the filter window:
-- Raw count of completed orders
SELECT COUNT(*) FROM dbg_orders WHERE status = 'completed';
-- MV count (what the view has processed)
SELECT SUM(order_count) FROM dbg_order_hourly_revenue;
A discrepancy here means the MV's filter or window is excluding some records that appear in the source.
Look at rw_catalog.rw_event_logs for recovery events. If the system restarted during ingestion, some records may have been reprocessed or skipped:
SELECT timestamp, event_type, info->>'collectBarrierFail' AS detail
FROM rw_catalog.rw_event_logs
WHERE event_type IN ('COLLECT_BARRIER_FAIL', 'GLOBAL_RECOVERY_START', 'GLOBAL_RECOVERY_SUCCESS')
ORDER BY timestamp DESC
LIMIT 10;
A recent GLOBAL_RECOVERY_START followed by GLOBAL_RECOVERY_SUCCESS means the system recovered from a crash. RisingWave's exactly-once checkpoint guarantees that no data was lost or duplicated during recovery, but the recovery event itself is a useful timestamp for correlating with observed data anomalies.
Tracing Data Through the Pipeline with System Tables
RisingWave exposes several system tables that let you trace how data flows through a streaming pipeline. Together, they answer the question: "for a given materialized view, what does it depend on, and where does it sit in the compute topology?"
Mapping Dependencies with rw_depend
The rw_catalog.rw_depend table records which objects depend on which other objects. Join it with rw_catalog.rw_relation_info to get readable names:
SELECT
dep.relationname AS dependent_object,
dep.relationtype AS dependent_type,
ref.relationname AS referenced_object,
ref.relationtype AS referenced_type
FROM rw_catalog.rw_depend d
JOIN rw_catalog.rw_relation_info dep ON dep.relationid = d.objid
JOIN rw_catalog.rw_relation_info ref ON ref.relationid = d.refobjid
WHERE dep.relationname LIKE 'dbg_%'
OR ref.relationname LIKE 'dbg_%'
ORDER BY dep.relationname;
Output:
dependent_object | dependent_type | referenced_object | referenced_type
--------------------------+-------------------+-------------------+-----------------
dbg_order_hourly_revenue | MATERIALIZED VIEW | dbg_orders | TABLE
dbg_order_stats | MATERIALIZED VIEW | dbg_orders | TABLE
This confirms that both MVs depend on dbg_orders. If you need to drop dbg_orders, you must drop the dependent MVs first. In production, this query is the safe way to discover all downstream dependencies before making schema changes.
Inspecting All Relations
rw_catalog.rw_relation_info gives you a unified view of all tables, materialized views, and other relations in the database:
SELECT schemaname, relationname, relationtype, relationid
FROM rw_catalog.rw_relation_info
WHERE relationname LIKE 'dbg_%'
ORDER BY relationtype, relationname;
Output:
schemaname | relationname | relationtype | relationid
------------+--------------------------+-------------------+------------
public | dbg_order_hourly_revenue | MATERIALIZED VIEW | 929
public | dbg_order_stats | MATERIALIZED VIEW | 842
public | dbg_orders | TABLE | 692
public | dbg_products | TABLE | 775
The relationid here matches the id column in rw_streaming_jobs, rw_materialized_views, rw_table_stats, and rw_fragments, so you can join across all of them using this ID.
Retrieving MV Definitions
To confirm what SQL defines an existing materialized view (useful when your team did not document the original query):
SELECT name, definition
FROM rw_catalog.rw_materialized_views
WHERE name = 'dbg_order_stats';
Output:
name | definition
-----------------+--------------------------------------------------------------------------------------------------
dbg_order_stats | CREATE MATERIALIZED VIEW dbg_order_stats AS SELECT region, status, COUNT(*) AS order_count,
SUM(amount) AS total_amount, AVG(amount) AS avg_amount, MAX(event_time) AS latest_order_time
FROM dbg_orders GROUP BY region, status
Pair this with EXPLAIN (which you can run against the definition) to trace the full execution plan for a view you inherited.
Checking Compute Node Health
Verify the state of all worker nodes to confirm your compute layer is healthy:
SELECT id, host, type, state, parallelism, is_streaming
FROM rw_catalog.rw_worker_nodes
ORDER BY id;
Output (single-node deployment):
id | host | type | state | parallelism | is_streaming
----+-----------+--------------------------+---------+-------------+--------------
0 | 127.0.0.1 | WORKER_TYPE_META | RUNNING | |
4 | 127.0.0.1 | WORKER_TYPE_COMPUTE_NODE | RUNNING | 10 | t
5 | 0.0.0.0 | WORKER_TYPE_FRONTEND | RUNNING | |
6 | 127.0.0.1 | WORKER_TYPE_COMPACTOR | RUNNING | 30 |
In a distributed cluster, a WORKER_TYPE_COMPUTE_NODE with state other than RUNNING means that node's actors are unavailable and its fragments have been redistributed to surviving nodes. If parallelism per node drops unexpectedly, RisingWave may rebalance, causing a temporary lag spike.
A Diagnostic Query Playbook
Debugging in production requires moving quickly. Keep these queries ready:
1. What streaming jobs exist and what state are they in?
SELECT id, name, status, parallelism
FROM rw_catalog.rw_streaming_jobs
ORDER BY id;
2. Which materialized views depend on a given table?
SELECT dep.relationname AS dependent_object, dep.relationtype
FROM rw_catalog.rw_depend d
JOIN rw_catalog.rw_relation_info dep ON dep.relationid = d.objid
JOIN rw_catalog.rw_relation_info ref ON ref.relationid = d.refobjid
WHERE ref.relationname = 'your_table_name';
3. Is any MV still backfilling?
SHOW JOBS;
-- or
SELECT ddl_id, progress, backfill_type, initialized_at
FROM rw_catalog.rw_ddl_progress
ORDER BY initialized_at DESC;
4. How fresh is a given MV? (requires a timestamp column)
SELECT
MAX(latest_order_time) AS last_event,
NOW() AS current_time,
EXTRACT(EPOCH FROM (NOW() - MAX(latest_order_time))) AS lag_seconds
FROM dbg_order_stats;
5. What does a query's execution plan look like before creating the MV?
EXPLAIN CREATE MATERIALIZED VIEW my_mv AS
SELECT ...;
6. Did the system recently recover from a crash?
SELECT timestamp, event_type
FROM rw_catalog.rw_event_logs
WHERE event_type IN ('GLOBAL_RECOVERY_START', 'GLOBAL_RECOVERY_SUCCESS')
ORDER BY timestamp DESC
LIMIT 5;
7. How large are my materialized views?
SELECT
mv.name,
ts.total_key_count,
ts.total_key_size + ts.total_value_size AS storage_bytes
FROM rw_catalog.rw_table_stats ts
JOIN rw_catalog.rw_materialized_views mv ON mv.id = ts.id
ORDER BY storage_bytes DESC;
FAQ
What is barrier latency in RisingWave, and why does it matter for debugging?
Barrier latency is the time it takes for a checkpoint barrier to travel from the meta node through all streaming actors and return. It is RisingWave's primary health signal. High barrier latency means some actor in the dataflow is slow or blocked, which causes all materialized views to fall behind. When debugging lag or stale results, always check barrier latency first: it either confirms the pipeline is healthy (low latency) or points to a bottleneck operator.
How do I find which materialized view is causing backpressure?
Start with rw_catalog.rw_fragments joined to rw_catalog.rw_streaming_jobs. Look for fragments with unexpectedly low parallelism, or fragments with the STREAM_SCAN flag still active (meaning they are still backfilling). Cross-reference with SHOW JOBS to confirm which jobs are in the backfill phase. In Prometheus-instrumented deployments, the stream_actor_processing_time_ms metric broken down by actor_id pinpoints the slow actor directly.
Can I change a materialized view's query definition without dropping it?
Not directly. RisingWave does not support ALTER MATERIALIZED VIEW ... AS (new query). You must drop the old MV and create a new one. If you have downstream MVs that depend on the one you are changing, you must also drop and recreate those in dependency order. Use rw_catalog.rw_depend to identify all dependents before you start.
What happens to my materialized view data if RisingWave restarts?
RisingWave persists all MV state to its storage layer (Hummock, backed by S3 or a local disk). On restart, it replays from the last checkpoint, which is taken every second by default. You will not lose data, but the MV may be up to one checkpoint interval behind the latest events. The recovery event appears in rw_catalog.rw_event_logs with type GLOBAL_RECOVERY_SUCCESS, including the duration of the recovery.
Conclusion
Debugging streaming SQL queries in production is an engineering discipline, not a black art. Here are the key takeaways:
- Run
EXPLAIN CREATE MATERIALIZED VIEW ...before deploying any new view to catch plan problems early. Read the plan bottom-up: data flows fromStreamTableScanthrough filters, projections, exchanges, and aggregations toStreamMaterialize. - Measure materialized view lag directly: compare
MAX(event_time)in the view toNOW(). A lag under 5 seconds is typical in healthy pipelines. - Use
SHOW JOBSandrw_catalog.rw_ddl_progressto track backfill progress. A view is not fully operational until the backfill completes. - Query
rw_catalog.rw_fragmentsandrw_catalog.rw_streaming_jobsto understand the distributed execution topology and identify skewed parallelism. - Use
rw_catalog.rw_dependto map pipeline dependencies before making schema changes. - Watch
rw_catalog.rw_event_logsfor barrier failures and recovery events to correlate system restarts with data anomalies.
For more on monitoring in production, see the RisingWave performance and metrics guide and the backpressure handling deep dive on the RisingWave blog. For the full list of system tables available for introspection, see the system catalog reference.
Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart
Join our Slack community to ask questions and connect with other stream processing developers.

