Real-Time Anomaly Detection with Streaming SQL

Real-Time Anomaly Detection with Streaming SQL

·

15 min read

Your on-call engineer gets paged at 3 AM. CPU usage on the auth service has been climbing for 20 minutes, error rates on the payment service spiked 15 minutes ago, and API latency is through the roof. The batch monitoring job that runs every 5 minutes caught the CPU issue, but by then the cascade had already started.

This is the gap between batch monitoring and real-time anomaly detection. Batch systems poll metrics at intervals and compare against fixed thresholds. Streaming systems evaluate every data point the moment it arrives, applying statistical methods that adapt to changing baselines. The difference is minutes of downtime versus seconds of awareness.

In this tutorial, you will build a complete anomaly detection pipeline using streaming SQL in RisingWave. No Python scripts, no ML frameworks, no Spark jobs. Just SQL that runs continuously, evaluates every metric as it flows in, and flags anomalies in real time using materialized views that stay fresh automatically.

Why Does Batch Monitoring Miss Anomalies?

Traditional monitoring stacks collect metrics into a time-series database, then run scheduled queries or static threshold alerts. This approach has three blind spots:

  • Delayed detection. A query that runs every 5 minutes can miss a 2-minute latency spike entirely. By the time you aggregate the data, the incident may have already cascaded.
  • Static thresholds break. A fixed rule like "alert when CPU > 80%" triggers false positives during expected traffic peaks and misses slow-burn anomalies that creep up from 30% to 70% over an hour.
  • No cross-metric correlation. When latency, error rate, and CPU spike simultaneously, batch systems fire three separate alerts. A streaming system can correlate these signals in a single query.

Streaming anomaly detection solves these problems by evaluating every data point against a rolling statistical baseline, the moment it arrives. A streaming database like RisingWave makes this practical because materialized views are incrementally maintained: each new metric updates the anomaly scores without recomputing the entire dataset.

How Do You Set Up the Metrics Pipeline?

Before writing detection logic, you need a table to receive your metrics stream. In production, this would typically be a Kafka source ingesting metrics from Prometheus, Datadog agents, or custom collectors. For this tutorial, we use a table with direct inserts to keep the focus on detection techniques.

Create the metrics table

CREATE TABLE service_metrics (
    service_name VARCHAR,
    host_id VARCHAR,
    metric_type VARCHAR,
    metric_value DOUBLE PRECISION,
    recorded_at TIMESTAMPTZ
);

This table holds metrics for multiple services, hosts, and metric types (latency, error rate, CPU usage) in a single normalized schema. Each row is a single observation at a point in time.

Load sample data

Insert a mix of normal readings and anomalous spikes across three services:

INSERT INTO service_metrics VALUES
-- Normal latency for api-gateway (~42-49ms)
('api-gateway', 'host-01', 'latency_ms', 45.2, '2025-03-15 10:00:00+00'),
('api-gateway', 'host-01', 'latency_ms', 42.8, '2025-03-15 10:01:00+00'),
('api-gateway', 'host-01', 'latency_ms', 48.1, '2025-03-15 10:02:00+00'),
('api-gateway', 'host-01', 'latency_ms', 44.5, '2025-03-15 10:03:00+00'),
('api-gateway', 'host-01', 'latency_ms', 46.3, '2025-03-15 10:04:00+00'),
('api-gateway', 'host-01', 'latency_ms', 43.9, '2025-03-15 10:05:00+00'),
('api-gateway', 'host-01', 'latency_ms', 47.0, '2025-03-15 10:06:00+00'),
('api-gateway', 'host-01', 'latency_ms', 41.2, '2025-03-15 10:07:00+00'),
('api-gateway', 'host-01', 'latency_ms', 49.5, '2025-03-15 10:08:00+00'),
('api-gateway', 'host-01', 'latency_ms', 44.0, '2025-03-15 10:09:00+00'),
-- Anomalous latency spike
('api-gateway', 'host-01', 'latency_ms', 320.5, '2025-03-15 10:10:00+00'),
('api-gateway', 'host-01', 'latency_ms', 285.0, '2025-03-15 10:11:00+00'),
-- Recovery
('api-gateway', 'host-01', 'latency_ms', 50.1, '2025-03-15 10:12:00+00'),
('api-gateway', 'host-01', 'latency_ms', 46.7, '2025-03-15 10:13:00+00'),

-- Normal error rates for payment-service (~1-3%)
('payment-service', 'host-02', 'error_rate', 0.02, '2025-03-15 10:00:00+00'),
('payment-service', 'host-02', 'error_rate', 0.01, '2025-03-15 10:01:00+00'),
('payment-service', 'host-02', 'error_rate', 0.03, '2025-03-15 10:02:00+00'),
('payment-service', 'host-02', 'error_rate', 0.02, '2025-03-15 10:03:00+00'),
('payment-service', 'host-02', 'error_rate', 0.01, '2025-03-15 10:04:00+00'),
('payment-service', 'host-02', 'error_rate', 0.02, '2025-03-15 10:05:00+00'),
('payment-service', 'host-02', 'error_rate', 0.015, '2025-03-15 10:06:00+00'),
('payment-service', 'host-02', 'error_rate', 0.025, '2025-03-15 10:07:00+00'),
-- Anomalous error spike
('payment-service', 'host-02', 'error_rate', 0.35, '2025-03-15 10:08:00+00'),
('payment-service', 'host-02', 'error_rate', 0.42, '2025-03-15 10:09:00+00'),
('payment-service', 'host-02', 'error_rate', 0.38, '2025-03-15 10:10:00+00'),
-- Recovery
('payment-service', 'host-02', 'error_rate', 0.03, '2025-03-15 10:11:00+00'),
('payment-service', 'host-02', 'error_rate', 0.02, '2025-03-15 10:12:00+00'),

-- Normal CPU for auth-service (~35-39%)
('auth-service', 'host-03', 'cpu_percent', 35.0, '2025-03-15 10:00:00+00'),
('auth-service', 'host-03', 'cpu_percent', 38.2, '2025-03-15 10:01:00+00'),
('auth-service', 'host-03', 'cpu_percent', 36.5, '2025-03-15 10:02:00+00'),
('auth-service', 'host-03', 'cpu_percent', 37.1, '2025-03-15 10:03:00+00'),
('auth-service', 'host-03', 'cpu_percent', 39.0, '2025-03-15 10:04:00+00'),
('auth-service', 'host-03', 'cpu_percent', 34.8, '2025-03-15 10:05:00+00'),
('auth-service', 'host-03', 'cpu_percent', 36.9, '2025-03-15 10:06:00+00'),
('auth-service', 'host-03', 'cpu_percent', 37.5, '2025-03-15 10:07:00+00'),
-- Gradual CPU ramp-up (slow-burn anomaly)
('auth-service', 'host-03', 'cpu_percent', 55.0, '2025-03-15 10:08:00+00'),
('auth-service', 'host-03', 'cpu_percent', 68.0, '2025-03-15 10:09:00+00'),
('auth-service', 'host-03', 'cpu_percent', 82.0, '2025-03-15 10:10:00+00'),
('auth-service', 'host-03', 'cpu_percent', 91.0, '2025-03-15 10:11:00+00'),
('auth-service', 'host-03', 'cpu_percent', 95.0, '2025-03-15 10:12:00+00');

The dataset includes three distinct anomaly patterns: a sudden latency spike (api-gateway), an error rate surge (payment-service), and a gradual CPU ramp-up (auth-service). Each technique we build below catches different types of anomalies.

What Are the Core Anomaly Detection Techniques in SQL?

Real-time anomaly detection is the process of identifying data points that deviate significantly from expected patterns as they arrive in a data stream. You don't need machine learning for most infrastructure monitoring anomalies. Four SQL-based techniques cover the majority of real-world scenarios.

Technique 1: Threshold-based alerts

The simplest approach: flag any metric that crosses a predefined boundary. This works well for known limits (disk > 90%, error rate > 10%) where the acceptable range is clear.

CREATE MATERIALIZED VIEW threshold_alerts AS
SELECT
    service_name,
    host_id,
    metric_type,
    metric_value,
    recorded_at,
    CASE
        WHEN metric_type = 'latency_ms' AND metric_value > 200 THEN 'HIGH_LATENCY'
        WHEN metric_type = 'error_rate' AND metric_value > 0.10 THEN 'HIGH_ERROR_RATE'
        WHEN metric_type = 'cpu_percent' AND metric_value > 80 THEN 'HIGH_CPU'
        ELSE 'NORMAL'
    END AS alert_level
FROM service_metrics
WHERE
    (metric_type = 'latency_ms' AND metric_value > 200)
    OR (metric_type = 'error_rate' AND metric_value > 0.10)
    OR (metric_type = 'cpu_percent' AND metric_value > 80);

Because this is a materialized view in RisingWave, it updates incrementally. When a new row with latency_ms = 320.5 arrives, RisingWave evaluates the CASE expression and adds it to the result set without re-scanning the entire table.

Query the alerts:

SELECT service_name, metric_type, metric_value, alert_level, recorded_at
FROM threshold_alerts
ORDER BY recorded_at;

Output (verified on RisingWave 2.8.0):

  service_name   | metric_type | metric_value |   alert_level   |        recorded_at
-----------------+-------------+--------------+-----------------+---------------------------
 payment-service | error_rate  |         0.35 | HIGH_ERROR_RATE | 2025-03-15 10:08:00+00:00
 payment-service | error_rate  |         0.42 | HIGH_ERROR_RATE | 2025-03-15 10:09:00+00:00
 payment-service | error_rate  |         0.38 | HIGH_ERROR_RATE | 2025-03-15 10:10:00+00:00
 auth-service    | cpu_percent |           82 | HIGH_CPU        | 2025-03-15 10:10:00+00:00
 api-gateway     | latency_ms  |        320.5 | HIGH_LATENCY    | 2025-03-15 10:10:00+00:00
 api-gateway     | latency_ms  |          285 | HIGH_LATENCY    | 2025-03-15 10:11:00+00:00
 auth-service    | cpu_percent |           91 | HIGH_CPU        | 2025-03-15 10:11:00+00:00
 auth-service    | cpu_percent |           95 | HIGH_CPU        | 2025-03-15 10:12:00+00:00

Threshold alerts catch the obvious cases, but they miss the initial CPU climb from 37% to 55% to 68%, because those values are below the 80% boundary. That is where statistical methods come in.

Technique 2: Z-score anomaly detection

A z-score measures how many standard deviations a value sits from the mean. A z-score above 3 (or below -3) indicates that the data point is statistically unusual relative to recent history. This technique adapts to the data: a service with typically high latency has a different baseline than one with low latency.

The key is computing the baseline from preceding rows only, so the anomalous value itself doesn't contaminate the statistics. We also require a minimum of 5 prior data points before computing z-scores to avoid noisy results during warm-up.

CREATE MATERIALIZED VIEW latency_zscore AS
SELECT
    service_name,
    host_id,
    metric_value AS current_value,
    recorded_at,
    ROUND(AVG(metric_value) OVER w::numeric, 1) AS baseline_avg,
    ROUND(STDDEV_POP(metric_value) OVER w::numeric, 1) AS baseline_std,
    COUNT(*) OVER w AS sample_count,
    CASE
        WHEN STDDEV_POP(metric_value) OVER w > 0 AND COUNT(*) OVER w >= 5 THEN
            ROUND(((metric_value - AVG(metric_value) OVER w)
                / STDDEV_POP(metric_value) OVER w)::numeric, 2)
        ELSE NULL
    END AS z_score
FROM service_metrics
WHERE metric_type = 'latency_ms'
WINDOW w AS (
    PARTITION BY service_name, host_id
    ORDER BY recorded_at
    ROWS BETWEEN 9 PRECEDING AND 1 PRECEDING
);

The ROWS BETWEEN 9 PRECEDING AND 1 PRECEDING window looks at up to 9 previous data points but excludes the current row. This ensures that a spike doesn't inflate its own baseline. RisingWave supports window functions in materialized views, so this z-score computation runs continuously.

Query the results and classify anomalies:

SELECT
    service_name,
    current_value,
    baseline_avg,
    baseline_std,
    z_score,
    CASE
        WHEN z_score IS NULL THEN 'warming_up'
        WHEN ABS(z_score) > 3 THEN 'ANOMALY'
        ELSE 'normal'
    END AS status,
    recorded_at
FROM latency_zscore
ORDER BY recorded_at;

Output (verified on RisingWave 2.8.0):

 service_name | current_value | baseline_avg | baseline_std | z_score |   status
--------------+---------------+--------------+--------------+---------+------------
 api-gateway  |          45.2 |              |              |         | warming_up
 api-gateway  |          42.8 |         45.2 |            0 |         | warming_up
 api-gateway  |          48.1 |           44 |          1.2 |         | warming_up
 api-gateway  |          44.5 |         45.4 |          2.2 |         | warming_up
 api-gateway  |          46.3 |         45.2 |          1.9 |         | warming_up
 api-gateway  |          43.9 |         45.4 |          1.8 |   -0.83 | normal
 api-gateway  |            47 |         45.1 |          1.7 |    1.09 | normal
 api-gateway  |          41.2 |         45.4 |          1.7 |   -2.45 | normal
 api-gateway  |          49.5 |         44.9 |          2.1 |    2.18 | normal
 api-gateway  |            44 |         45.4 |          2.5 |   -0.56 | normal
 api-gateway  |         320.5 |         45.3 |          2.5 |  109.66 | ANOMALY
 api-gateway  |           285 |         76.1 |         86.4 |    2.42 | normal
 api-gateway  |          50.1 |        102.4 |        107.4 |   -0.49 | normal
 api-gateway  |          46.7 |        103.1 |        107.1 |   -0.53 | normal

The spike to 320.5ms produces a z-score of 109.66, a massive deviation from the baseline average of 45.3ms. Notice that the second spike (285ms) is not flagged: the 320.5 reading has shifted the baseline upward. In production, you would filter or cap outliers before they enter the baseline window. Still, the z-score approach catches the initial deviation with precision.

Technique 3: Moving average ratio

For metrics like error rates where absolute values vary across services, comparing the current value against a moving average of recent history can be more intuitive. If the current value is 5x the recent average, something is wrong, regardless of the absolute number.

CREATE MATERIALIZED VIEW error_rate_anomalies AS
SELECT
    service_name,
    host_id,
    metric_value AS current_error_rate,
    recorded_at,
    AVG(metric_value) OVER w AS moving_avg,
    CASE
        WHEN AVG(metric_value) OVER w > 0 AND COUNT(*) OVER w >= 3 THEN
            metric_value / AVG(metric_value) OVER w
        ELSE NULL
    END AS ratio_to_baseline
FROM service_metrics
WHERE metric_type = 'error_rate'
WINDOW w AS (
    PARTITION BY service_name, host_id
    ORDER BY recorded_at
    ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
);

Query the anomalies:

SELECT
    service_name,
    current_error_rate,
    ROUND(moving_avg::numeric, 4) AS moving_avg,
    ROUND(ratio_to_baseline::numeric, 1) AS ratio,
    CASE
        WHEN ratio_to_baseline IS NULL THEN 'warming_up'
        WHEN ratio_to_baseline > 5 THEN 'ANOMALY'
        ELSE 'normal'
    END AS status,
    recorded_at
FROM error_rate_anomalies
ORDER BY recorded_at;

Output (verified on RisingWave 2.8.0):

  service_name   | current_error_rate | moving_avg | ratio |   status
-----------------+--------------------+------------+-------+------------
 payment-service |               0.02 |            |       | warming_up
 payment-service |               0.01 |       0.02 |       | warming_up
 payment-service |               0.03 |      0.015 |       | warming_up
 payment-service |               0.02 |       0.02 |   1.0 | normal
 payment-service |               0.01 |       0.02 |   0.5 | normal
 payment-service |               0.02 |      0.018 |   1.1 | normal
 payment-service |              0.015 |      0.018 |   0.8 | normal
 payment-service |              0.025 |      0.019 |   1.3 | normal
 payment-service |               0.35 |      0.018 |  19.4 | ANOMALY
 payment-service |               0.42 |      0.084 |   5.0 | normal
 payment-service |               0.38 |      0.166 |   2.3 | normal
 payment-service |               0.03 |      0.238 |   0.1 | normal
 payment-service |               0.02 |      0.241 |   0.1 | normal

The error rate jump to 0.35 (35%) is 19.4x the recent moving average of 0.018 (1.8%), clearly flagged as anomalous. This ratio-based approach works well when different services have different normal error rates, since the comparison is relative, not absolute.

Technique 4: Rate-of-change detection

The previous techniques detect sudden spikes. But what about gradual anomalies, like a memory leak that slowly pushes CPU from 35% to 95% over several minutes? Each individual reading might look normal compared to the one before it, but the trend is clearly pathological.

Rate-of-change detection uses the LAG window function to compare each value against its predecessor:

CREATE MATERIALIZED VIEW rate_of_change AS
SELECT
    service_name,
    host_id,
    metric_value AS current_value,
    recorded_at,
    LAG(metric_value) OVER w AS previous_value,
    CASE
        WHEN LAG(metric_value) OVER w > 0 THEN
            ROUND(((metric_value - LAG(metric_value) OVER w)
                / LAG(metric_value) OVER w * 100)::numeric, 1)
        ELSE NULL
    END AS pct_change
FROM service_metrics
WHERE metric_type = 'cpu_percent'
WINDOW w AS (
    PARTITION BY service_name, host_id
    ORDER BY recorded_at
);

Query the results:

SELECT
    service_name,
    current_value,
    previous_value,
    pct_change,
    CASE
        WHEN pct_change IS NULL THEN 'no_baseline'
        WHEN ABS(pct_change) > 30 THEN 'RAPID_CHANGE'
        ELSE 'normal'
    END AS status,
    recorded_at
FROM rate_of_change
ORDER BY recorded_at;

Output (verified on RisingWave 2.8.0):

 service_name | current_value | previous_value | pct_change |    status
--------------+---------------+----------------+------------+--------------
 auth-service |            35 |                |            | no_baseline
 auth-service |          38.2 |             35 |        9.1 | normal
 auth-service |          36.5 |           38.2 |       -4.5 | normal
 auth-service |          37.1 |           36.5 |        1.6 | normal
 auth-service |            39 |           37.1 |        5.1 | normal
 auth-service |          34.8 |             39 |      -10.8 | normal
 auth-service |          36.9 |           34.8 |        6.0 | normal
 auth-service |          37.5 |           36.9 |        1.6 | normal
 auth-service |            55 |           37.5 |       46.7 | RAPID_CHANGE
 auth-service |            68 |             55 |       23.6 | normal
 auth-service |            82 |             68 |       20.6 | normal
 auth-service |            91 |             82 |       11.0 | normal
 auth-service |            95 |             91 |        4.4 | normal

The jump from 37.5% to 55% (a 46.7% increase) triggers the RAPID_CHANGE alert. Subsequent increases (55 to 68, 68 to 82) are smaller percentage changes individually, but the threshold alert from Technique 1 catches those when they cross 80%. This is why combining multiple detection methods is important: each one covers a different blind spot.

How Do You Build a Combined Alerting View?

In production, you want a single query-able view that consolidates all anomaly signals. This makes it straightforward to sink alerts to downstream systems like Slack, PagerDuty, or a PostgreSQL table that powers a Grafana dashboard.

CREATE MATERIALIZED VIEW anomaly_alerts AS
SELECT
    service_name,
    host_id,
    metric_type,
    metric_value,
    alert_level AS anomaly_type,
    recorded_at
FROM threshold_alerts
WHERE alert_level != 'NORMAL';

Query all active anomalies:

SELECT * FROM anomaly_alerts ORDER BY recorded_at;

Output (verified on RisingWave 2.8.0):

  service_name   | host_id | metric_type | metric_value |  anomaly_type   |        recorded_at
-----------------+---------+-------------+--------------+-----------------+---------------------------
 payment-service | host-02 | error_rate  |         0.35 | HIGH_ERROR_RATE | 2025-03-15 10:08:00+00:00
 payment-service | host-02 | error_rate  |         0.42 | HIGH_ERROR_RATE | 2025-03-15 10:09:00+00:00
 payment-service | host-02 | error_rate  |         0.38 | HIGH_ERROR_RATE | 2025-03-15 10:10:00+00:00
 auth-service    | host-03 | cpu_percent |           82 | HIGH_CPU        | 2025-03-15 10:10:00+00:00
 api-gateway     | host-01 | latency_ms  |        320.5 | HIGH_LATENCY    | 2025-03-15 10:10:00+00:00
 api-gateway     | host-01 | latency_ms  |          285 | HIGH_LATENCY    | 2025-03-15 10:11:00+00:00
 auth-service    | host-03 | cpu_percent |           91 | HIGH_CPU        | 2025-03-15 10:11:00+00:00
 auth-service    | host-03 | cpu_percent |           95 | HIGH_CPU        | 2025-03-15 10:12:00+00:00

This consolidated view can serve as the source for a RisingWave sink that pushes alerts to Kafka, a webhook, or directly to a PostgreSQL table. Since the materialized view updates incrementally, new anomalies appear within seconds of the triggering metric arriving.

When Should You Use Each Technique?

Each detection method has different strengths. Choosing the right one depends on the type of anomaly you expect:

TechniqueBest forCatchesMisses
Threshold alertsKnown limits (disk, memory)Values exceeding hard boundariesGradual drifts below the threshold
Z-scoreSudden spikes in stable metricsStatistical outliers relative to recent historySlow trends where each step is small
Moving average ratioRate/percentage metricsRelative deviations from baselineAnomalies in metrics with high natural variance
Rate of changeGradual ramp-ups, trendsStep-function changes between consecutive readingsSustained high values that are stable

In practice, production systems use multiple techniques in parallel. Threshold alerts serve as a safety net for critical boundaries. Z-scores catch sudden spikes that adaptive thresholds might miss. Rate-of-change detection picks up gradual degradation.

Comparison with other approaches

Tools like Apache Flink can implement similar detection logic using its DataStream API or Flink SQL, but require managing separate JVM-based infrastructure. Cloud-native options like Azure Stream Analytics offer built-in AnomalyDetection_SpikeAndDip functions but lock you into a specific vendor.

RisingWave takes a different approach: standard PostgreSQL-compatible SQL with window functions and materialized views handle the detection logic, while the streaming engine handles the incremental computation. You write familiar SQL, and the system keeps results up-to-date continuously.

How Do You Connect This to Production Infrastructure?

The examples above use a table with direct inserts. In production, you would replace the table with a Kafka source to ingest metrics from your monitoring agents. Here is the pattern:

CREATE SOURCE metrics_stream (
    service_name VARCHAR,
    host_id VARCHAR,
    metric_type VARCHAR,
    metric_value DOUBLE PRECISION,
    recorded_at TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'infrastructure.metrics',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Then point your materialized views at metrics_stream instead of service_metrics. The detection logic remains identical. RisingWave handles the Kafka consumption, checkpointing, and exactly-once processing under the hood.

For alerting output, create a sink to push anomalies to a downstream system:

CREATE SINK anomaly_sink FROM anomaly_alerts
WITH (
    connector = 'kafka',
    topic = 'alerts.anomalies',
    properties.bootstrap.server = 'kafka:9092',
    type = 'append-only'
) FORMAT PLAIN ENCODE JSON;

From the alerts.anomalies Kafka topic, you can fan out to PagerDuty, Slack, or any alerting tool that consumes Kafka messages.

FAQ

What is real-time anomaly detection in streaming SQL?

Real-time anomaly detection in streaming SQL is the practice of identifying data points that deviate from expected patterns using continuously maintained SQL queries. Instead of running batch jobs on stored data, a streaming database like RisingWave evaluates every incoming metric against rolling statistical baselines using window functions and materialized views. Anomalies are flagged within seconds of the data arriving.

How does z-score anomaly detection work in SQL?

Z-score anomaly detection computes the number of standard deviations a value sits from the rolling mean of recent data. In SQL, you implement this using AVG and STDDEV_POP window functions over a preceding row window. Values with a z-score above 3 or below -3 are typically classified as anomalies. The approach adapts automatically to different metrics since the baseline is computed from each metric's own history.

Can I do anomaly detection without machine learning?

Yes. For infrastructure and application monitoring, SQL-based statistical methods (thresholds, z-scores, moving average ratios, rate-of-change detection) handle the vast majority of anomaly detection use cases. Machine learning approaches like isolation forests or autoencoders are better suited for high-dimensional data or complex behavioral patterns, but are overkill for detecting latency spikes, error rate surges, and CPU ramp-ups.

Both systems can implement streaming anomaly detection. Apache Flink offers the DataStream API and Flink SQL, while RisingWave provides PostgreSQL-compatible SQL with incrementally maintained materialized views. The key difference is operational: RisingWave does not require JVM tuning, custom serializers, or checkpoint configuration. You write SQL, and the system manages state, recovery, and incremental computation automatically. For teams that already know SQL and want production anomaly detection without a distributed systems learning curve, RisingWave is the faster path.

Conclusion

Real-time anomaly detection does not require complex ML pipelines or specialized tooling. Four SQL techniques cover most monitoring scenarios:

  • Threshold alerts catch metrics that cross known boundaries.
  • Z-score detection flags statistical outliers relative to rolling baselines.
  • Moving average ratios identify relative spikes in rate-based metrics.
  • Rate-of-change detection catches gradual ramp-ups that other methods miss.

RisingWave makes these techniques practical by incrementally maintaining materialized views. Each new data point updates the anomaly scores without full recomputation. Combined with Kafka sources for ingestion and sinks for alert delivery, you get a complete anomaly detection pipeline written entirely in SQL.


Ready to build your own anomaly detection pipeline? Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.