Building a 5G Network Anomaly Detection System

Building a 5G Network Anomaly Detection System

5G NR networks generate vastly more telemetry per site than previous generations — beam-level metrics, xApp events from the O-RAN RIC, and per-slice KPIs all arrive continuously. RisingWave, a PostgreSQL-compatible streaming database, can evaluate anomaly detection rules against this stream in real time and alert NOC engineers before minor glitches cascade into outages.

Why 5G Anomaly Detection Matters

The density and complexity of 5G NR deployments — massive MIMO antennas, mmWave small cells, dynamic spectrum sharing — create failure modes that have no 4G equivalent. A misconfigured beam sweep can degrade coverage for an entire neighborhood in seconds. A signaling storm from a faulty UE can saturate an AMF. A rogue gNB can inject interference that affects dozens of co-channel cells.

Traditional threshold-based monitoring catches single-KPI violations but misses multi-dimensional anomalies. Real-time streaming analytics can evaluate composite conditions — for example, simultaneous RSRQ degradation on multiple beams at adjacent cells — that individually look like noise but together indicate interference.

Key 5G-specific signals:

  • RSRP / RSRQ per beam and per cell (dBm / dB)
  • SS-SINR (Secondary Synchronization Signal SINR, dB)
  • PDCP throughput per bearer and per slice
  • gNB cell ID and PLMN ID
  • UE count and active bearer count per cell
  • Handover ping-pong rate — rapid successive HOs indicating coverage hole
  • AMF N2 signaling rate — spike indicates potential storm

How Streaming SQL Solves This

RisingWave ingests telemetry from the O-RAN RIC xApp data bus, gNB performance counters, and core network event streams. Materialized views compute per-cell and per-cluster baselines continuously. Anomaly rules written in SQL compare current values against those baselines using sliding and tumbling windows.

Because all the logic lives in SQL, network engineers — not data engineers — can tune detection rules without redeploying code.

Step-by-Step Tutorial

Step 1: Data Source

Ingest 5G NR cell telemetry from Kafka. Each event is a beam-and-cell metric bundle from the gNB.

CREATE SOURCE nr_cell_metrics (
    gnb_id              VARCHAR,        -- gNB identifier
    cell_id             VARCHAR,        -- NR cell ID (NCI)
    plmn_id             VARCHAR,
    beam_id             INT,
    rsrp_dbm            DOUBLE PRECISION,
    rsrq_db             DOUBLE PRECISION,
    ss_sinr_db          DOUBLE PRECISION,
    pdcp_dl_mbps        DOUBLE PRECISION,
    pdcp_ul_mbps        DOUBLE PRECISION,
    ue_count            INT,
    active_bearers      INT,
    ho_attempts         BIGINT,
    ho_ping_pong_count  BIGINT,
    n2_msg_rate         DOUBLE PRECISION,  -- AMF signaling messages/sec
    event_time          TIMESTAMPTZ
)
WITH (
    connector      = 'kafka',
    topic          = 'telecom.5g.cell.metrics',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON
( TIMESTAMP COLUMN = event_time );

Step 2: Core View

Compute per-cell baseline statistics over a 15-minute tumbling window, then compare each incoming reading against the most recent baseline to derive anomaly scores.

CREATE MATERIALIZED VIEW nr_cell_baseline_15min AS
SELECT
    cell_id,
    gnb_id,
    window_start,
    window_end,
    AVG(rsrp_dbm)           AS avg_rsrp,
    STDDEV(rsrp_dbm)        AS std_rsrp,
    AVG(rsrq_db)            AS avg_rsrq,
    STDDEV(rsrq_db)         AS std_rsrq,
    AVG(ss_sinr_db)         AS avg_sinr,
    AVG(ue_count)           AS avg_ue_count,
    AVG(n2_msg_rate)        AS avg_n2_rate,
    STDDEV(n2_msg_rate)     AS std_n2_rate,
    AVG(ho_ping_pong_count) AS avg_ping_pong,
    COUNT(*)                AS sample_count
FROM TUMBLE(nr_cell_metrics, event_time, INTERVAL '15 MINUTES')
GROUP BY cell_id, gnb_id, window_start, window_end;

CREATE MATERIALIZED VIEW nr_anomaly_scores AS
SELECT
    m.cell_id,
    m.gnb_id,
    m.plmn_id,
    m.event_time,
    m.rsrp_dbm,
    m.rsrq_db,
    m.ss_sinr_db,
    m.ue_count,
    m.n2_msg_rate,
    m.ho_ping_pong_count,
    b.avg_rsrq,
    b.avg_n2_rate,
    b.std_n2_rate,
    -- Z-score for N2 signaling rate (potential signaling storm)
    CASE
        WHEN b.std_n2_rate > 0
        THEN (m.n2_msg_rate - b.avg_n2_rate) / b.std_n2_rate
        ELSE 0
    END AS n2_zscore,
    -- RSRQ deviation flag
    CASE
        WHEN b.std_rsrq > 0
        THEN ABS(m.rsrq_db - b.avg_rsrq) / b.std_rsrq
        ELSE 0
    END AS rsrq_zscore,
    -- Ping-pong HO anomaly
    m.ho_ping_pong_count > b.avg_ping_pong * 3 AS ping_pong_anomaly,
    -- Coverage degradation flag
    m.rsrp_dbm < -115 AND m.rsrq_db < -14 AS coverage_hole_flag
FROM nr_cell_metrics m
LEFT JOIN nr_cell_baseline_15min b
    ON  m.cell_id    = b.cell_id
    AND m.event_time >= b.window_start
    AND m.event_time <  b.window_end;

Step 3: Alerts and Sinks

Classify anomalies and emit structured alerts.

CREATE MATERIALIZED VIEW nr_anomaly_alerts AS
SELECT
    cell_id,
    gnb_id,
    plmn_id,
    event_time,
    rsrp_dbm,
    rsrq_db,
    n2_msg_rate,
    n2_zscore,
    rsrq_zscore,
    CASE
        WHEN n2_zscore > 5                         THEN 'SIGNALING_STORM'
        WHEN coverage_hole_flag                    THEN 'COVERAGE_HOLE'
        WHEN ping_pong_anomaly                     THEN 'HO_PING_PONG'
        WHEN rsrq_zscore > 4                       THEN 'INTERFERENCE_EVENT'
        ELSE 'ANOMALY'
    END AS anomaly_type,
    CASE
        WHEN n2_zscore > 5 OR coverage_hole_flag   THEN 'CRITICAL'
        WHEN n2_zscore > 3 OR rsrq_zscore > 4      THEN 'WARNING'
        ELSE 'INFO'
    END AS severity
FROM nr_anomaly_scores
WHERE n2_zscore > 3
   OR rsrq_zscore > 4
   OR ping_pong_anomaly
   OR coverage_hole_flag;

CREATE SINK nr_anomaly_alerts_sink
FROM nr_anomaly_alerts
WITH (
    connector  = 'kafka',
    topic      = 'telecom.5g.anomaly.alerts',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Comparison Table

ApproachMulti-DimensionalBaseline-Relative5G Beam MetricsLatency
Static threshold alarmsNoNoLimited15 min
ML anomaly detection (batch)YesYesYes1–4 hours
Custom Flink topologyYesPossibleYesSeconds (complex)
RisingWave streaming SQLYesYesYesSeconds

FAQ

Q: Can I detect anomalies across multiple adjacent cells simultaneously? Yes. Join the nr_anomaly_scores view with itself on a spatial adjacency table (cell-to-neighbor mapping). If three or more adjacent cells show simultaneous RSRQ degradation, that pattern indicates interference and deserves a higher severity alert.

Q: How do I tune the Z-score thresholds to reduce false positives? Start with a threshold of 3 (three standard deviations) and monitor alert volume for a week. Adjust upward if the NOC is overwhelmed, or add a temporal persistence filter — for example, require the condition to persist for two consecutive windows before alerting.

Q: Does RisingWave support late-arriving metrics from backhaul-constrained small cells? Yes. Configure a watermark delay on the source to allow late data to arrive before windows close. RisingWave's watermark mechanism lets you trade a small amount of additional latency for more complete window aggregations.

Key Takeaways

  • RisingWave evaluates multi-dimensional 5G anomaly conditions — signaling storms, coverage holes, HO ping-pong — using Z-score analysis written in plain SQL.
  • Baseline-relative detection catches anomalies that static thresholds miss, especially in networks where absolute KPI levels vary by location and time of day.
  • The PostgreSQL-compatible interface lets network engineers modify detection rules without involving data engineering teams.
  • Alerts reach the NOC within seconds, enabling response before a local anomaly propagates across the network.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.