How to Monitor Power Grid Anomalies in Real Time

Real-time power grid anomaly monitoring uses a PostgreSQL-compatible streaming database to evaluate sensor telemetry continuously against baseline thresholds and statistical patterns—surfacing voltage deviations, frequency excursions, and load anomalies within seconds of occurrence, before they cascade into outages.

Why This Matters for Energy Operators

Grid anomalies are time-sensitive by nature. A voltage sag that persists for more than 200 milliseconds can trip industrial equipment. A frequency deviation beyond ±0.5 Hz triggers automatic protection relays. An undetected line overload can cause transformer damage within minutes.

Traditional monitoring architectures rely on SCADA polling at 2–4 second intervals, followed by batch aggregation into dashboards that refresh every minute or more. By the time an alert appears on an operator's screen, the anomaly may already have caused downstream protection actions.

The gap between event and awareness is where outages become uncontrollable. Streaming SQL closes that gap by making anomaly detection part of the data ingestion pipeline itself—not a downstream reporting step.

How Streaming SQL Works for Energy Data

RisingWave, a PostgreSQL-compatible streaming database, ingests raw telemetry from Kafka topics and evaluates SQL expressions continuously against the live data stream. The result is a set of always-current materialized views that any monitoring tool can query over a standard Postgres connection.

Key capabilities for anomaly detection:

Windowed statistical aggregations: compute rolling mean and standard deviation over the last N minutes
Change detection: compare current values against baseline windows to detect step changes
Pattern matching: identify sequences of events (e.g., three consecutive voltage readings above threshold)
Multi-stream correlation: join data from multiple feeders to detect zone-wide vs. localized anomalies

Building the System: Step by Step

Step 1: Connect the Data Source

Ingest high-frequency telemetry from RTUs and PMUs via Kafka:

CREATE SOURCE pmu_telemetry (
    device_id     VARCHAR,
    substation_id VARCHAR,
    voltage_kv    DOUBLE PRECISION,
    current_a     DOUBLE PRECISION,
    frequency_hz  DOUBLE PRECISION,
    power_mw      DOUBLE PRECISION,
    power_factor  DOUBLE PRECISION,
    ts            TIMESTAMPTZ
) WITH (
    connector     = 'kafka',
    topic         = 'grid.pmu.raw',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Build Real-Time Aggregations

Compute rolling baselines and current statistics in parallel:

-- Rolling 10-minute baseline per device
CREATE MATERIALIZED VIEW device_baseline AS
SELECT
    device_id,
    window_start,
    window_end,
    AVG(voltage_kv)   AS avg_voltage,
    STDDEV(voltage_kv) AS stddev_voltage,
    AVG(frequency_hz) AS avg_frequency,
    AVG(power_mw)     AS avg_power
FROM TUMBLE(pmu_telemetry, ts, INTERVAL '10' MINUTE)
GROUP BY device_id, window_start, window_end;

-- 30-second rolling snapshot for real-time comparison
CREATE MATERIALIZED VIEW device_snapshot AS
SELECT
    device_id,
    window_end                     AS snapshot_time,
    AVG(voltage_kv)                AS current_voltage,
    AVG(frequency_hz)              AS current_frequency,
    MAX(power_mw)                  AS peak_power,
    COUNT(*)                       AS sample_count
FROM TUMBLE(pmu_telemetry, ts, INTERVAL '30' SECOND)
GROUP BY device_id, window_end;

Step 3: Detect Anomalies and Generate Alerts

Detect voltage deviations beyond 2 standard deviations and frequency excursions in a single view:

CREATE MATERIALIZED VIEW grid_anomalies AS
WITH latest_baseline AS (
    SELECT DISTINCT ON (device_id)
        device_id, avg_voltage, stddev_voltage, avg_frequency
    FROM device_baseline
    ORDER BY device_id, window_end DESC
)
SELECT
    s.device_id,
    s.snapshot_time,
    s.current_voltage,
    s.current_frequency,
    b.avg_voltage           AS baseline_voltage,
    b.stddev_voltage        AS baseline_stddev,
    ABS(s.current_voltage - b.avg_voltage) / NULLIF(b.stddev_voltage, 0) AS voltage_z_score,
    CASE
        WHEN ABS(s.current_voltage - b.avg_voltage) > 2 * b.stddev_voltage THEN 'VOLTAGE_DEVIATION'
        WHEN s.current_frequency < 49.8 OR s.current_frequency > 50.2    THEN 'FREQUENCY_EXCURSION'
        WHEN s.peak_power > 0.95 * sub.rated_mw                           THEN 'OVERLOAD'
        ELSE NULL
    END AS anomaly_type
FROM device_snapshot s
JOIN latest_baseline b ON s.device_id = b.device_id
JOIN substation_config sub ON s.device_id = sub.device_id
WHERE
    ABS(s.current_voltage - b.avg_voltage) > 2 * b.stddev_voltage
    OR s.current_frequency < 49.8 OR s.current_frequency > 50.2
    OR s.peak_power > 0.95 * sub.rated_mw;

Step 4: Integrate with SCADA/EMS Downstream

Write anomaly events to a Kafka topic for consumption by SCADA alarm managers and NOC ticketing systems:

CREATE SINK anomaly_alert_sink
FROM grid_anomalies
WITH (
    connector = 'kafka',
    topic     = 'grid.anomalies.alerts',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

For Grafana or PagerDuty integration, query the grid_anomalies view directly via the Postgres wire protocol. No additional middleware is required.

Comparison: Batch vs Streaming

Capability	Batch Monitoring	Streaming SQL
Detection latency	1–5 minutes	Under 5 seconds
Baseline computation	Nightly recalculation	Continuously updated rolling windows
False positive tuning	Redeploy pipeline	Alter materialized view definition
Multi-stream correlation	Complex join jobs	Single SQL JOIN across live views
Operator alert freshness	Stale by pipeline interval	Real-time
Infrastructure overhead	Spark cluster + schedulers	Single streaming database
Historical anomaly replay	Manual backfill required	Replay from Kafka offsets

FAQ

How do we handle the high message rate from Phasor Measurement Units (PMUs), which can publish at 30–120 samples per second? RisingWave is designed for high-throughput ingestion. It parallelizes consumption across Kafka partitions and uses incremental view maintenance to update aggregations efficiently. For very high-frequency sources, pre-aggregating at the RTU or using a shorter tumbling window (e.g., 1 second) reduces ingestion pressure while preserving anomaly detection fidelity.

Can we define custom anomaly thresholds per substation rather than global rules? Yes. Store per-device thresholds in a reference table and JOIN it with the snapshot view—exactly as shown with substation_config above. Updating a threshold row in the reference table is reflected in the alert view immediately.

How does the system behave during planned maintenance windows when readings are expected to be abnormal? Maintain a maintenance_schedule table with device, start time, and end time. Add a LEFT JOIN and filter condition to the anomaly view to suppress alerts for devices within their scheduled window. No pipeline redeployment is needed—just an INSERT into the schedule table.

Key Takeaways

Power grid anomaly detection requires sub-second or single-digit-second latency; batch pipelines cannot provide this without architectural changes.
Streaming SQL in RisingWave lets you express rolling baselines, z-score deviation, and threshold crossing in plain SQL with no custom application code.
Materialized views maintain state incrementally, meaning complex multi-window joins do not recompute from scratch on every query.
The Postgres wire protocol compatibility means existing monitoring stacks (Grafana, Metabase, custom dashboards) connect with zero modification.
Anomaly suppression for maintenance windows and per-device threshold tables are simple operational knobs, not pipeline reconfigurations.

Further reading: