Real-Time Outage Detection and Root Cause Analysis

Real-Time Outage Detection and Root Cause Analysis

Grid outages cost utilities millions and frustrate customers. Traditional SCADA and EMS systems flag events after the fact. With RisingWave, a PostgreSQL-compatible streaming database, you can continuously query live telemetry to detect outages and surface root causes before the first call comes in.

Why Outage Detection Matters

Every minute of unplanned downtime has a direct financial and safety impact. A medium-voltage feeder trip affecting thousands of customers triggers regulatory reporting requirements, SLA penalties, and emergency dispatch costs. Worse, post-incident root cause analysis (RCA) today relies on engineers manually correlating SCADA event logs, AMI last-gasp signals, and protection relay records—a process that takes hours.

Streaming SQL changes the equation. By continuously joining real-time feeder telemetry, protection relay events, and AMI meter pings, you can:

  • Detect anomalies within seconds of the first sensor deviation.
  • Correlate upstream and downstream events to isolate the fault segment automatically.
  • Route alerts to field crews with structured context before they leave the depot.

The Streaming SQL Approach

RisingWave ingests raw telemetry from Kafka topics and processes it with continuously updated materialized views. Unlike batch ETL pipelines, these views are always fresh—new rows trigger incremental computation the instant they arrive. The result is sub-second latency from event to insight without any scheduled queries or polling loops.

Key RisingWave capabilities used in this tutorial:

  • CREATE SOURCE — connect to Kafka topics carrying SCADA and AMI events.
  • CREATE MATERIALIZED VIEW — define incremental aggregations and joins.
  • TUMBLE windows — bucket telemetry into fixed time intervals.
  • CREATE SINK — push alerts back to Kafka or a JDBC endpoint for ticketing systems.

Step-by-Step Tutorial

Step 1: Data Source Setup

Create sources for feeder telemetry and AMI last-gasp events arriving from your SCADA and AMI head-end via Kafka.

-- Feeder telemetry: voltage (kV), current (A), status from SCADA RTUs
CREATE SOURCE feeder_telemetry (
    feeder_id        VARCHAR,
    substation_id    VARCHAR,
    event_ts         TIMESTAMPTZ,
    voltage_kv       DOUBLE PRECISION,
    current_a        DOUBLE PRECISION,
    breaker_status   VARCHAR,   -- CLOSED | OPEN | FAULT
    mw_load          DOUBLE PRECISION
) WITH (
    connector = 'kafka',
    topic = 'scada.feeder.telemetry',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

-- AMI last-gasp: meters report outage signal before battery dies
CREATE SOURCE ami_last_gasp (
    meter_id         VARCHAR,
    feeder_id        VARCHAR,
    substation_id    VARCHAR,
    event_ts         TIMESTAMPTZ,
    signal_type      VARCHAR    -- LAST_GASP | RESTORE
) WITH (
    connector = 'kafka',
    topic = 'ami.events.lastgasp',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

-- Protection relay events from substation IEDs
CREATE SOURCE relay_events (
    relay_id         VARCHAR,
    feeder_id        VARCHAR,
    event_ts         TIMESTAMPTZ,
    event_type       VARCHAR,   -- TRIP | RECLOSE | LOCKOUT
    fault_current_a  DOUBLE PRECISION,
    zone             INTEGER
) WITH (
    connector = 'kafka',
    topic = 'protection.relay.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Core Materialized View

Aggregate AMI last-gasp counts per feeder in 1-minute tumbling windows to detect mass outage events.

-- Count AMI last-gasp signals per feeder per minute
CREATE MATERIALIZED VIEW ami_outage_agg AS
SELECT
    feeder_id,
    substation_id,
    window_start,
    window_end,
    COUNT(*) FILTER (WHERE signal_type = 'LAST_GASP') AS last_gasp_count,
    COUNT(*) FILTER (WHERE signal_type = 'RESTORE')   AS restore_count
FROM TUMBLE(ami_last_gasp, event_ts, INTERVAL '1 MINUTE')
GROUP BY feeder_id, substation_id, window_start, window_end;

-- Track feeder voltage deviation: average kV vs nominal 11 kV
CREATE MATERIALIZED VIEW feeder_voltage_stats AS
SELECT
    feeder_id,
    substation_id,
    window_start,
    window_end,
    AVG(voltage_kv)                         AS avg_voltage_kv,
    MIN(voltage_kv)                         AS min_voltage_kv,
    COUNT(*) FILTER (WHERE breaker_status = 'FAULT') AS fault_count,
    MAX(mw_load)                            AS peak_load_mw
FROM TUMBLE(feeder_telemetry, event_ts, INTERVAL '1 MINUTE')
GROUP BY feeder_id, substation_id, window_start, window_end;

-- Join relay trips with AMI signals for root cause correlation
CREATE MATERIALIZED VIEW outage_correlation AS
SELECT
    r.feeder_id,
    r.relay_id,
    r.event_ts          AS relay_trip_ts,
    r.fault_current_a,
    r.zone              AS fault_zone,
    a.last_gasp_count,
    a.window_start      AS ami_window_start
FROM relay_events r
JOIN ami_outage_agg a
    ON  r.feeder_id = a.feeder_id
    AND r.event_ts BETWEEN a.window_start AND a.window_end
WHERE r.event_type = 'TRIP';

Step 3: Alerting Logic

Generate structured outage alerts when last-gasp count crosses a threshold and a relay trip is confirmed.

-- Outage alert: feeder with ≥10 last-gasp signals in 1 minute
CREATE MATERIALIZED VIEW outage_alerts AS
SELECT
    oc.feeder_id,
    oc.relay_id,
    oc.relay_trip_ts,
    oc.fault_current_a,
    oc.fault_zone,
    oc.last_gasp_count,
    CASE
        WHEN oc.fault_current_a > 5000 THEN 'HIGH_IMPEDANCE_FAULT'
        WHEN oc.fault_current_a > 2000 THEN 'PHASE_TO_GROUND'
        ELSE 'TRANSIENT_OVERCURRENT'
    END AS probable_cause,
    NOW() AS alert_generated_at
FROM outage_correlation oc
WHERE oc.last_gasp_count >= 10;

-- Sink alerts to Kafka for dispatch and ticketing integration
CREATE SINK outage_alert_sink AS
SELECT * FROM outage_alerts
WITH (
    connector = 'kafka',
    topic = 'grid.alerts.outage',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Comparison Table

ApproachDetection LatencyRCA TimeScalability
Manual SCADA review5–30 minutesHoursLow
Batch ETL + SQL5–15 minutes30–60 minMedium
Stream processing (custom code)SecondsMinutesHigh (complex ops)
RisingWave streaming SQLSecondsSecondsHigh (SQL-native)

FAQ

Q: Can RisingWave ingest SCADA data that arrives via MQTT or OPC-UA rather than Kafka?
A: RisingWave natively supports Kafka. For MQTT or OPC-UA sources, a lightweight bridge (such as a Kafka Connect adapter) forwards messages to a Kafka topic, which RisingWave then consumes via CREATE SOURCE.

Q: How do I handle duplicate or out-of-order SCADA events?
A: RisingWave's watermark mechanism and event-time windows (TUMBLE/HOP) handle late and out-of-order events gracefully. You can define a watermark delay on the source to buffer late arrivals before closing each window.

Q: What happens to historical data once the window closes?
A: Materialized views persist their aggregated state in RisingWave's storage layer. You can query them like any PostgreSQL table and also sink the results to Iceberg, S3, or a JDBC warehouse for long-term retention and batch analytics.

Key Takeaways

  • RisingWave can detect feeder outages in seconds by continuously joining SCADA telemetry, AMI last-gasp signals, and protection relay events.
  • Streaming SQL replaces brittle custom code: a few CREATE MATERIALIZED VIEW statements handle windowed aggregation, cross-stream joins, and probabilistic fault classification.
  • Alerts sink directly to Kafka or JDBC, integrating naturally with existing OMS and field dispatch workflows.
  • The PostgreSQL-compatible interface means your existing BI tools, JDBC drivers, and SQL skills work without modification.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.