Threshold-Based Alerts from Streaming Data Using SQL

Threshold-based alerts from streaming data let you define SQL conditions that fire the moment a metric crosses a limit, without polling. In RisingWave, you write those conditions as materialized views. The view updates incrementally as each event arrives, and the alert row appears within milliseconds of the breach. No scheduler, no cron, no PromQL required.

This guide walks through four alert patterns you can build with SQL alone: simple static thresholds, rolling average thresholds, multi-condition alerts that combine signals, and deduplication windows that prevent alert storms. Every SQL block has been verified against RisingWave 2.8.0.

Why Streaming SQL Is Well-Suited for Alerting

Traditional alert stacks poll metrics every 15 to 60 seconds. Prometheus scrapes, stores, evaluates rules on a schedule, and then routes through Alertmanager. Each hop adds latency, and each component has its own configuration format.

A streaming database like RisingWave evaluates alert conditions incrementally. When a new row arrives, RisingWave updates only the affected portion of the materialized view. There is no scan, no batch, no poll cycle. The result is sub-second alert latency using the same SQL you already know.

Aspect	Polling-based (Prometheus + Alertmanager)	Streaming SQL (RisingWave)
Evaluation model	Schedule (every 15-60s)	Incremental, per-event
Minimum alert latency	15-60 seconds	Under 1 second
Rule language	PromQL / custom DSL	Standard SQL
Multi-signal correlation	Requires custom glue code	SQL JOINs across streams
State management	External TSDB	Built-in
Number of services to operate	4+ (Prometheus, TSDB, Alertmanager, exporters)	1

For teams that already route events through Kafka or Kinesis, RisingWave connects directly to those topics using CREATE SOURCE. No separate ingestion layer.

Setting Up the Data Tables

This tutorial uses two tables: one for infrastructure metrics (CPU, memory) and one for HTTP request logs. In production these would be sources backed by Kafka topics. For verification here, they are plain tables seeded with representative data.

CREATE TABLE thresh_metrics (
    host        VARCHAR,
    service     VARCHAR,
    metric_name VARCHAR,
    value       DOUBLE PRECISION,
    ts          TIMESTAMPTZ
);

CREATE TABLE thresh_requests (
    host        VARCHAR,
    service     VARCHAR,
    status_code INT,
    latency_ms  DOUBLE PRECISION,
    ts          TIMESTAMPTZ
);

Seed the metrics table with CPU and memory readings across four hosts:

INSERT INTO thresh_metrics VALUES
  ('web-01', 'api',      'cpu_pct',  92.5, NOW() - INTERVAL '5 minutes'),
  ('web-01', 'api',      'cpu_pct',  87.1, NOW() - INTERVAL '4 minutes'),
  ('web-02', 'api',      'cpu_pct',  45.0, NOW() - INTERVAL '4 minutes'),
  ('web-01', 'api',      'cpu_pct',  95.3, NOW() - INTERVAL '3 minutes'),
  ('web-02', 'api',      'cpu_pct',  48.2, NOW() - INTERVAL '3 minutes'),
  ('web-01', 'api',      'cpu_pct',  91.0, NOW() - INTERVAL '2 minutes'),
  ('web-02', 'api',      'cpu_pct',  52.1, NOW() - INTERVAL '2 minutes'),
  ('web-01', 'api',      'cpu_pct',  93.7, NOW() - INTERVAL '1 minute'),
  ('db-01',  'postgres', 'mem_pct',  78.0, NOW() - INTERVAL '5 minutes'),
  ('db-01',  'postgres', 'mem_pct',  81.2, NOW() - INTERVAL '4 minutes'),
  ('db-01',  'postgres', 'mem_pct',  85.6, NOW() - INTERVAL '3 minutes'),
  ('db-01',  'postgres', 'mem_pct',  89.1, NOW() - INTERVAL '2 minutes'),
  ('db-01',  'postgres', 'mem_pct',  91.5, NOW() - INTERVAL '1 minute'),
  ('db-02',  'postgres', 'mem_pct',  42.0, NOW() - INTERVAL '5 minutes'),
  ('db-02',  'postgres', 'mem_pct',  44.3, NOW() - INTERVAL '4 minutes'),
  ('db-02',  'postgres', 'mem_pct',  41.8, NOW() - INTERVAL '3 minutes'),
  ('db-02',  'postgres', 'mem_pct',  43.1, NOW() - INTERVAL '2 minutes'),
  ('db-02',  'postgres', 'mem_pct',  40.5, NOW() - INTERVAL '1 minute');

Seed the requests table with HTTP responses, including a cluster of 5xx errors on web-01:

INSERT INTO thresh_requests VALUES
  ('web-01', 'api', 200, 45.0,  NOW() - INTERVAL '4 minutes'),
  ('web-01', 'api', 200, 52.0,  NOW() - INTERVAL '4 minutes'),
  ('web-01', 'api', 500, 230.0, NOW() - INTERVAL '4 minutes'),
  ('web-01', 'api', 500, 189.0, NOW() - INTERVAL '4 minutes'),
  ('web-01', 'api', 500, 312.0, NOW() - INTERVAL '3 minutes'),
  ('web-01', 'api', 200, 67.0,  NOW() - INTERVAL '3 minutes'),
  ('web-01', 'api', 200, 71.0,  NOW() - INTERVAL '3 minutes'),
  ('web-01', 'api', 503, 455.0, NOW() - INTERVAL '3 minutes'),
  ('web-01', 'api', 503, 512.0, NOW() - INTERVAL '2 minutes'),
  ('web-01', 'api', 500, 280.0, NOW() - INTERVAL '2 minutes'),
  ('web-01', 'api', 200, 55.0,  NOW() - INTERVAL '2 minutes'),
  ('web-01', 'api', 200, 48.0,  NOW() - INTERVAL '2 minutes'),
  ('web-02', 'api', 200, 38.0,  NOW() - INTERVAL '4 minutes'),
  ('web-02', 'api', 200, 42.0,  NOW() - INTERVAL '3 minutes'),
  ('web-02', 'api', 200, 39.0,  NOW() - INTERVAL '2 minutes'),
  ('web-02', 'api', 200, 41.0,  NOW() - INTERVAL '1 minute');

Pattern 1: Simple Static Threshold Alerts

The most direct form of threshold alerting fires as soon as a single reading exceeds a fixed limit. You define severity tiers inside a CASE expression and filter to only the rows that breach at least the lowest tier.

CREATE MATERIALIZED VIEW thresh_simple_alerts AS
SELECT
    host,
    service,
    metric_name,
    value,
    ts,
    CASE
        WHEN metric_name = 'cpu_pct' AND value > 90 THEN 'CRITICAL'
        WHEN metric_name = 'cpu_pct' AND value > 75 THEN 'WARNING'
        WHEN metric_name = 'mem_pct' AND value > 85 THEN 'CRITICAL'
        WHEN metric_name = 'mem_pct' AND value > 70 THEN 'WARNING'
        ELSE 'OK'
    END AS severity
FROM thresh_metrics
WHERE
    (metric_name = 'cpu_pct' AND value > 75)
    OR (metric_name = 'mem_pct' AND value > 70);

Query the result:

SELECT host, service, metric_name, value, severity
FROM thresh_simple_alerts
ORDER BY severity, host, ts;

Output:

  host  | service  | metric_name | value | severity
--------+----------+-------------+-------+----------
 db-01  | postgres | mem_pct     |  85.6 | CRITICAL
 db-01  | postgres | mem_pct     |  89.1 | CRITICAL
 db-01  | postgres | mem_pct     |  91.5 | CRITICAL
 web-01 | api      | cpu_pct     |  92.5 | CRITICAL
 web-01 | api      | cpu_pct     |  95.3 | CRITICAL
 web-01 | api      | cpu_pct     |    91 | CRITICAL
 web-01 | api      | cpu_pct     |  93.7 | CRITICAL
 db-01  | postgres | mem_pct     |    78 | WARNING
 db-01  | postgres | mem_pct     |  81.2 | WARNING
 web-01 | api      | cpu_pct     |  87.1 | WARNING
(10 rows)

This view updates in real time. The moment a new metric reading arrives, RisingWave evaluates the CASE logic and adds or removes the corresponding row. web-01 is in trouble on CPU; db-01 is climbing on memory. web-02 and db-02 are healthy and produce no rows.

When to use this pattern: For metrics with well-understood fixed limits. CPU above 90% is a reasonable hard boundary independent of historical behavior. Static thresholds are fast to implement and easy to tune.

Limitation: A single spike that clears immediately still fires an alert. The next pattern solves that.

Pattern 2: Rolling Average Threshold Alerts

Spikes happen. A brief CPU spike to 92% during a garbage collection cycle is different from sustained load at 92% for five minutes. Rolling average thresholds require the metric to sustain above a limit over a window before firing.

RisingWave's TUMBLE() function divides the stream into non-overlapping time windows. You group by host and metric within each window, compute the average, and apply a HAVING clause to suppress windows that stay below the threshold.

CREATE MATERIALIZED VIEW thresh_rolling_avg_alerts AS
SELECT
    host,
    service,
    metric_name,
    ROUND(AVG(value)::numeric, 2)  AS avg_5min,
    MAX(value)                     AS peak_value,
    COUNT(*)                       AS sample_count,
    window_start,
    window_end
FROM TUMBLE(thresh_metrics, ts, INTERVAL '5 minutes')
WHERE metric_name = 'cpu_pct'
GROUP BY host, service, metric_name, window_start, window_end
HAVING AVG(value) > 80;

Query the result:

SELECT host, service, metric_name, avg_5min, peak_value, sample_count, window_start
FROM thresh_rolling_avg_alerts;

Output:

  host  | service | metric_name | avg_5min | peak_value | sample_count |       window_start
--------+---------+-------------+----------+------------+--------------+---------------------------
 web-01 | api     | cpu_pct     |    93.70 |       93.7 |            1 | 2026-04-02 07:20:00+00:00
 web-01 | api     | cpu_pct     |    91.48 |       95.3 |            4 | 2026-04-02 07:15:00+00:00
(2 rows)

web-01 averages above 91% CPU across two consecutive 5-minute windows. web-02, with readings in the mid-40s to low-50s, produces no rows in this view because its average stays well below 80%.

When to use this pattern: For metrics where transient spikes are expected and only sustained load matters. Latency, error rate, and queue depth are other good candidates for rolling average thresholds.

Window size tradeoff: A shorter window catches problems faster but increases sensitivity to spikes. A longer window gives more signal but delays detection. A 5-minute tumbling window is a reasonable starting point for most infrastructure metrics.

For a deeper look at how windowing functions work in RisingWave, see the windowing in stream processing guide.

Pattern 3: Multi-Condition Alerts

The most actionable alerts fire only when multiple independent signals confirm a problem. A single host with a high error rate could be a transient blip. A high error rate combined with elevated latency on the same host at the same time is almost certainly real.

This pattern builds in two steps. First, compute per-window error rates from the request log:

CREATE MATERIALIZED VIEW thresh_error_rate_by_window AS
SELECT
    host,
    service,
    COUNT(*)                                                         AS total_requests,
    COUNT(*) FILTER (WHERE status_code >= 500)                      AS error_count,
    ROUND(
        (COUNT(*) FILTER (WHERE status_code >= 500)::NUMERIC
         / NULLIF(COUNT(*), 0) * 100), 2
    )                                                                AS error_rate_pct,
    ROUND(AVG(latency_ms)::numeric, 2)                              AS avg_latency_ms,
    window_start,
    window_end
FROM TUMBLE(thresh_requests, ts, INTERVAL '5 minutes')
GROUP BY host, service, window_start, window_end;

Query this intermediate view to verify:

SELECT host, service, total_requests, error_count, error_rate_pct, avg_latency_ms, window_start
FROM thresh_error_rate_by_window
ORDER BY host, window_start;

Output:

  host  | service | total_requests | error_count | error_rate_pct | avg_latency_ms |       window_start
--------+---------+----------------+-------------+----------------+----------------+---------------------------
 web-01 | api     |              8 |           4 |          50.00 |         177.63 | 2026-04-02 07:15:00+00:00
 web-01 | api     |              4 |           2 |          50.00 |         223.75 | 2026-04-02 07:20:00+00:00
 web-02 | api     |              2 |           0 |           0.00 |          40.00 | 2026-04-02 07:15:00+00:00
 web-02 | api     |              2 |           0 |           0.00 |          40.00 | 2026-04-02 07:20:00+00:00
(4 rows)

web-01 has a 50% error rate with average latency above 177ms. web-02 is clean. Now build the multi-condition alert view on top of this intermediate view:

CREATE MATERIALIZED VIEW thresh_multi_condition_alerts AS
SELECT
    host,
    service,
    error_rate_pct,
    avg_latency_ms,
    total_requests,
    window_start,
    window_end,
    'HIGH_ERROR_RATE_AND_LATENCY' AS alert_type,
    'CRITICAL'                    AS severity
FROM thresh_error_rate_by_window
WHERE error_rate_pct > 20
  AND avg_latency_ms > 150;

Query the result:

SELECT host, service, error_rate_pct, avg_latency_ms, alert_type, severity, window_start
FROM thresh_multi_condition_alerts
ORDER BY window_start;

Output:

  host  | service | error_rate_pct | avg_latency_ms |         alert_type          | severity |       window_start
--------+---------+----------------+----------------+-----------------------------+----------+---------------------------
 web-01 | api     |          50.00 |         177.63 | HIGH_ERROR_RATE_AND_LATENCY | CRITICAL | 2026-04-02 07:15:00+00:00
 web-01 | api     |          50.00 |         223.75 | HIGH_ERROR_RATE_AND_LATENCY | CRITICAL | 2026-04-02 07:20:00+00:00
(2 rows)

Only web-01 fires, and only because both conditions are true simultaneously. web-02, despite being in the same time window, produces no alert rows because its error rate and latency are healthy.

Why layer materialized views: Each materialized view maintains its own incremental state. When new request rows arrive, thresh_error_rate_by_window updates first, and then thresh_multi_condition_alerts re-evaluates its filter against the updated rows. RisingWave handles the dependency chain automatically. You can stack as many layers as your logic requires.

Other multi-condition patterns to consider:

CPU above 85% AND memory above 80% on the same host (cascade risk)
Error rate above 10% AND request volume above 1000/min (rules out low-traffic noise)
Latency p99 above 500ms AND 5xx rate above 5% (SLA breach)

For a production example of multi-signal correlation in streaming SQL, see the event-driven microservices with streaming SQL guide.

Pattern 4: Preventing Alert Storms with Deduplication Windows

The three patterns above fire one alert row per event or per window where a condition is met. In practice, a sustained incident produces a flood of alert rows, which translates to a flood of pages, Slack messages, or PagerDuty notifications. This is called an alert storm.

The standard fix is deduplication: after an alert fires, suppress identical alerts from the same host and metric for a defined cooldown period.

With streaming SQL, you implement deduplication by choosing a window size that matches your desired cooldown. Instead of emitting one row per breach event, the view emits one row per host per metric per window. A 15-minute tumbling window means at most one notification every 15 minutes, regardless of how many individual readings breach the threshold within that window.

CREATE MATERIALIZED VIEW thresh_deduped_alerts AS
SELECT
    host,
    service,
    metric_name,
    ROUND(MAX(value)::numeric, 2) AS peak_value,
    COUNT(*)                      AS breach_count,
    CASE
        WHEN metric_name = 'cpu_pct' AND MAX(value) > 90 THEN 'CRITICAL'
        WHEN metric_name = 'cpu_pct' AND MAX(value) > 75 THEN 'WARNING'
        WHEN metric_name = 'mem_pct' AND MAX(value) > 85 THEN 'CRITICAL'
        WHEN metric_name = 'mem_pct' AND MAX(value) > 70 THEN 'WARNING'
        ELSE 'OK'
    END AS severity,
    window_start,
    window_end
FROM TUMBLE(thresh_metrics, ts, INTERVAL '15 minutes')
WHERE
    (metric_name = 'cpu_pct' AND value > 75)
    OR (metric_name = 'mem_pct' AND value > 70)
GROUP BY host, service, metric_name, window_start, window_end;

Query the result:

SELECT host, service, metric_name, peak_value, breach_count, severity, window_start, window_end
FROM thresh_deduped_alerts
ORDER BY severity, host;

Output:

  host  | service  | metric_name | peak_value | breach_count | severity |       window_start        |        window_end
--------+----------+-------------+------------+--------------+----------+---------------------------+---------------------------
 db-01  | postgres | mem_pct     |      91.50 |            5 | CRITICAL | 2026-04-02 07:15:00+00:00 | 2026-04-02 07:30:00+00:00
 web-01 | api      | cpu_pct     |      95.30 |            5 | CRITICAL | 2026-04-02 07:15:00+00:00 | 2026-04-02 07:30:00+00:00
(2 rows)

Five individual threshold breaches on web-01 and five on db-01 collapse into exactly two alert rows: one per host per 15-minute window. A downstream Kafka sink connected to this view delivers exactly two notifications to your on-call system during this window, not ten.

The breach_count and peak_value columns give the responder context without requiring them to go back to raw metrics: five consecutive readings with a peak at 95.3% CPU tells a different story than a single spike.

Cooldown sizing guidance:

Metric type	Recommended cooldown window
Infrastructure (CPU, memory, disk)	15-30 minutes
Application error rate	5-10 minutes
Business metrics (revenue, order rate)	60 minutes
Security events	5 minutes or none

For security events and SLA-critical paths, you may deliberately choose not to deduplicate. Producing one alert per event is the right call when every occurrence must be logged.

Routing Alerts to Notification Channels

Once your alert views are defined, you connect them to downstream systems using RisingWave sinks. A Kafka sink sends every new alert row to a topic, and a lightweight consumer forwards it to Slack, PagerDuty, or any webhook:

CREATE SINK thresh_alerts_sink
FROM thresh_deduped_alerts
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'threshold-alerts'
)
FORMAT PLAIN ENCODE JSON;

Because thresh_deduped_alerts is a materialized view backed by a tumbling window, rows appear in the sink only when a window closes. The 15-minute window closes, the aggregate row is written to Kafka, and your consumer picks it up within seconds. No polling, no scheduler.

For teams using Kafka Connect, you can route that topic directly to PagerDuty, OpsGenie, or a custom alerting webhook without writing any application code.

Connecting to a Kafka Source in Production

The examples above use plain TABLE objects so you can run them without a Kafka cluster. In production, replace the table definitions with Kafka sources:

CREATE SOURCE thresh_metrics_source (
    host        VARCHAR,
    service     VARCHAR,
    metric_name VARCHAR,
    value       DOUBLE PRECISION,
    ts          TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'infra-metrics',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

The materialized view definitions stay identical. You simply replace thresh_metrics with thresh_metrics_source in the FROM clause. RisingWave handles offset tracking, backpressure, and exactly-once semantics internally.

For a walkthrough of connecting RisingWave to Kafka for real-time pipelines, see the streaming ETL pipeline guide.

Comparing Alert Pattern Tradeoffs

Pattern	Latency	False positive risk	Alert volume	Best for
Simple threshold	Sub-second	High (spikes)	High	Hard limits, security events
Rolling average	Window duration	Low	Medium	Infrastructure, latency
Multi-condition	Window duration	Very low	Low	Incident confirmation
Deduplication window	Window duration	Low	Very low	On-call routing

In most alerting systems you want all four layers working together. Simple thresholds feed into deduplication windows. Rolling averages feed into multi-condition views. The raw breach data is always available for forensics. The deduplicated, confirmed alert is what pages the on-call engineer.

FAQ

How is this different from the real-time alerting system article you already published?

The real-time alerting system guide covers the full pipeline: Kafka sources, application events, anomaly detection, and routing to Slack and PagerDuty. This article focuses specifically on the SQL patterns for threshold logic: static thresholds, rolling averages, multi-condition combinations, and deduplication. It is more narrowly scoped and more directly usable as a reference for engineers designing alert conditions.

Can I use hopping windows instead of tumbling windows for rolling averages?

Yes, and often you should. A tumbling window resets every 5 minutes, so a breach that starts at minute 4 and clears at minute 6 may be split across two windows and appear weaker than it is. A hopping window (for example, a 5-minute window that advances every 1 minute) gives you overlapping coverage and smoother detection. RisingWave supports both TUMBLE() and HOP() with the same syntax pattern. Hopping windows consume more memory because events appear in multiple windows simultaneously.

What happens if my metric source goes silent? Will alerts stop?

Yes. If the source table or Kafka topic produces no new rows, the materialized view receives no updates and produces no new alert rows. This is actually the correct behavior for most threshold alerts: no data means no breach. If you need "absence of data" alerts (heartbeat monitoring), you need a different pattern: use a CASE expression over a rolling window that fires when COUNT(*) = 0 within a given interval.

How do I tune thresholds over time without redeploying?

Store your thresholds in a separate table and JOIN it with your metrics view instead of hardcoding values in CASE expressions. For example:

CREATE TABLE thresh_config (
    metric_name VARCHAR,
    warning_level DOUBLE PRECISION,
    critical_level DOUBLE PRECISION,
    PRIMARY KEY (metric_name)
);

INSERT INTO thresh_config VALUES
    ('cpu_pct', 75, 90),
    ('mem_pct', 70, 85);

Then join in your materialized view:

CREATE MATERIALIZED VIEW thresh_dynamic_alerts AS
SELECT
    m.host,
    m.service,
    m.metric_name,
    m.value,
    m.ts,
    CASE
        WHEN m.value >= c.critical_level THEN 'CRITICAL'
        WHEN m.value >= c.warning_level  THEN 'WARNING'
    END AS severity
FROM thresh_metrics m
JOIN thresh_config c ON m.metric_name = c.metric_name
WHERE m.value >= c.warning_level;

Now you update thresh_config with a plain UPDATE statement and the materialized view reflects the new thresholds on the next incremental update. No DDL, no restart.

Summary

Threshold-based alerts from streaming data reduce to four SQL patterns in RisingWave:

Simple threshold alerts use CASE expressions in a materialized view to classify every breach in real time as it arrives.
Rolling average alerts use TUMBLE() windows with HAVING clauses to require sustained conditions before firing.
Multi-condition alerts layer materialized views to confirm incidents across multiple independent signals simultaneously.
Deduplication windows collapse repeated breach rows into one alert per window, preventing alert storms while preserving forensic detail.

All four patterns use standard SQL, run inside a single RisingWave instance, and update incrementally as new data arrives. You can connect any of these views to Kafka sinks for delivery to your existing notification infrastructure.

RisingWave is open source under Apache 2.0. To get started, visit the RisingWave quickstart guide or run it locally with brew install risingwave && risingwave playground.