Real-Time Fault Detection in Power Distribution Networks

Real-time fault detection in power distribution networks uses a PostgreSQL-compatible streaming database to continuously evaluate protection relay events, current and voltage measurements, and SCADA state changes—correlating signals across multiple devices to identify and locate faults within seconds rather than waiting for field crew reports or manual SCADA analysis.

Why This Matters for Energy Operators

Distribution network faults are a daily operational reality. In a typical utility, a medium-voltage distribution feeder experiences several faults per year. Most are temporary (transient faults cleared by automatic reclosers) but some are sustained, requiring crew dispatch and restoration switching. The critical metric is restoration time: every minute of outage multiplies customer minutes interrupted, the primary reliability index regulators track.

The challenge is fault location. Traditional fault analysis relies on:

Relay event records (usually reviewed post-fault)
Customer calls to the outage management system (delayed by call volume)
Field crew inspection (slow and expensive)

Streaming SQL adds a real-time layer: it correlates relay events, current measurements, and voltage signatures from across the network the moment they arrive, producing a fault location estimate before the first customer call reaches the call center.

How Streaming SQL Works for Energy Data

RisingWave ingests protection system events, SCADA state changes, and smart meter last-gasp signals from Kafka. Materialized views correlate these signals by location and time to identify fault patterns, estimate affected segments, and generate isolation and restoration switching sequences. The Postgres interface exposes this intelligence directly to OMS (Outage Management Systems) and mobile crew dispatch tools.

Building the System: Step by Step

Step 1: Connect the Data Source

Ingest relay events, SCADA states, and meter last-gasp signals:

-- Protection relay events (trip, reclose, lockout)
CREATE SOURCE relay_events (
    relay_id        VARCHAR,
    feeder_id       VARCHAR,
    segment_id      VARCHAR,
    event_type      VARCHAR,   -- 'TRIP', 'RECLOSE', 'LOCKOUT', 'CLOSE'
    fault_current_a DOUBLE PRECISION,
    phase           VARCHAR,   -- 'A', 'B', 'C', 'ABC', 'G'
    event_ts        TIMESTAMPTZ
) WITH (
    connector     = 'kafka',
    topic         = 'protection.relay.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

-- SCADA device state (breaker/switch position)
CREATE SOURCE scada_states (
    device_id       VARCHAR,
    feeder_id       VARCHAR,
    segment_id      VARCHAR,
    device_type     VARCHAR,
    state           VARCHAR,   -- 'OPEN', 'CLOSED'
    state_ts        TIMESTAMPTZ
) WITH (
    connector     = 'kafka',
    topic         = 'scada.device.states',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Build Real-Time Aggregations

Identify relay operation clusters and correlate with device states:

-- Relay trips in the last 5 minutes grouped by feeder
CREATE MATERIALIZED VIEW recent_relay_trips AS
SELECT
    feeder_id,
    window_start,
    window_end,
    COUNT(*)                        AS trip_count,
    ARRAY_AGG(relay_id)             AS tripped_relays,
    ARRAY_AGG(segment_id)           AS affected_segments,
    MAX(fault_current_a)            AS max_fault_current_a,
    MODE() WITHIN GROUP (ORDER BY phase) AS dominant_phase
FROM TUMBLE(
    (SELECT * FROM relay_events WHERE event_type IN ('TRIP', 'LOCKOUT')),
    event_ts,
    INTERVAL '5' MINUTE
)
GROUP BY feeder_id, window_start, window_end;

-- Current breaker/switch positions (latest state per device)
CREATE MATERIALIZED VIEW current_device_states AS
SELECT DISTINCT ON (device_id)
    device_id,
    feeder_id,
    segment_id,
    device_type,
    state,
    state_ts
FROM scada_states
ORDER BY device_id, state_ts DESC;

Step 3: Detect Anomalies and Generate Alerts

Correlate relay trips with device states to identify isolated fault segments:

CREATE MATERIALIZED VIEW fault_detections AS
SELECT
    t.feeder_id,
    t.window_end         AS detection_time,
    t.trip_count,
    t.tripped_relays,
    t.max_fault_current_a,
    t.dominant_phase,
    COUNT(DISTINCT d.segment_id) AS de_energized_segments,
    ARRAY_AGG(DISTINCT d.segment_id) AS isolated_segments,
    CASE
        WHEN t.trip_count >= 3        THEN 'SUSTAINED_FAULT'
        WHEN t.trip_count = 1         THEN 'POSSIBLE_TRANSIENT'
        ELSE 'MULTIPLE_TRIP_SEQUENCE'
    END AS fault_classification,
    CASE
        WHEN t.max_fault_current_a > 5000 THEN 'CRITICAL'
        WHEN t.max_fault_current_a > 2000 THEN 'HIGH'
        ELSE 'MODERATE'
    END AS severity
FROM recent_relay_trips t
LEFT JOIN current_device_states d
    ON t.feeder_id = d.feeder_id
    AND d.state = 'OPEN'
    AND d.state_ts BETWEEN t.window_start AND t.window_end + INTERVAL '2' MINUTE
GROUP BY t.feeder_id, t.window_end, t.trip_count, t.tripped_relays,
         t.max_fault_current_a, t.dominant_phase;

Step 4: Integrate with SCADA/EMS Downstream

Send fault detections to OMS for work order creation and crew dispatch:

CREATE SINK fault_detection_sink
FROM fault_detections
WITH (
    connector = 'kafka',
    topic     = 'distribution.faults',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

OMS systems consume the topic to auto-create outage events, pre-populate affected customer counts, and suggest isolation/restoration switching sequences based on network topology.

Comparison: Batch vs Streaming

Capability	Traditional Fault Analysis	Streaming SQL
Fault detection time	Minutes to hours (manual)	Seconds after relay events
Fault location estimate	Field crew or morning review	Automated segment correlation
Customer impact estimate	Manual GIS lookup	JOIN with customer-segment map
OMS integration	Manual data entry	Automatic Kafka consumer
Relay event correlation	Post-event log review	Real-time multi-relay JOIN
Transient vs. sustained classification	Manual	Automated trip-count logic
Crew dispatch information	Phone or radio	Structured JSON to mobile app

FAQ

How does the system distinguish between a transient fault (cleared by a recloser) and a sustained fault requiring crew dispatch? The relay event stream includes event types for TRIP, RECLOSE, and LOCKOUT. A recloser that trips and successfully recloses produces a TRIP followed by a CLOSE (or RECLOSE) event within seconds. A lockout produces a TRIP followed by no CLOSE within the observation window. The trip_count logic in the fault classification view captures this pattern.

Our protection system uses IEC 61850 GOOSE messages, not Kafka. How do we bridge this? An IEC 61850 to Kafka gateway (available from several industrial IoT vendors or implementable with open-source libraries) subscribes to GOOSE publications and republishes as structured JSON on Kafka topics. The RisingWave source schema above then applies without modification.

Can the system recommend the switching sequence for fault isolation and restoration? The streaming SQL layer identifies the fault segment and currently open devices. A topology engine (implemented as a reference table join or a separate microservice consuming the fault detection Kafka topic) can translate this into a switching sequence. This logic is typically graph-based and best implemented outside the SQL layer, consuming the fault detection output.

Key Takeaways

Streaming SQL correlates relay events and SCADA state changes in real time, producing fault location estimates within seconds of the fault event.
Transient vs. sustained fault classification is expressible as SQL logic based on trip count and reclose event sequences.
Kafka sinks deliver structured fault records to OMS platforms, enabling automatic outage event creation without manual data entry.
The architecture is additive: existing protection systems and SCADA infrastructure continue to operate unchanged, with streaming SQL as an intelligence layer above them.
PostgreSQL compatibility means OMS vendors, GIS systems, and field crew mobile applications can query fault status and affected segment data through standard database connections.

Further reading: