Real-Time Network Performance Monitoring with SQL

Telecom network operations centers (NOCs) need sub-minute visibility into KPIs across thousands of cell towers and backhaul links. RisingWave, a PostgreSQL-compatible streaming database, lets NOC engineers write standard SQL to aggregate RSRP, RSRQ, throughput, and latency metrics continuously — no custom stream-processing code required.

Why Network Performance Monitoring Matters

A degraded cell site affects thousands of subscribers simultaneously. Conventional performance management systems poll counters every 15 minutes, producing PM files that NOC staff process in batch. By the time a degradation trend is visible on a dashboard, customers have already experienced poor service and may have called the helpdesk.

Real-time streaming KPI aggregation closes this gap. With continuous materialized views, engineers can see per-cell metrics refresh every few seconds, correlate KPIs across layers (4G/5G), and trigger automated remediation (load balancing, parameter changes) before SLAs are breached.

Core KPI signals for network performance monitoring:

RSRP (Reference Signal Received Power, dBm): downlink signal strength
RSRQ (Reference Signal Received Quality, dB): signal quality relative to interference
PRB utilization (Physical Resource Block %): radio capacity usage
Throughput (Mbps uplink/downlink per cell)
Handover success rate (%)
RRC connection setup success rate (%)
Latency / round-trip time (ms)

How Streaming SQL Solves This

RisingWave is a PostgreSQL-compatible streaming database that processes telemetry from Kafka topics as it arrives. Materialized views defined with standard SQL window functions aggregate KPIs continuously. NOC dashboards query these views using any PostgreSQL-compatible client — Grafana, Metabase, or custom tooling.

Engineers define thresholds once in SQL. Alerts are written to Kafka sinks or PostgreSQL tables and consumed by ticketing systems. There is no separate alerting engine to maintain.

Step-by-Step Tutorial

Step 1: Data Source

Ingest cell-level performance counters from a Kafka topic. Each record is a per-cell-per-minute counter snapshot produced by the base station or O-RAN RIC.

CREATE SOURCE cell_kpi_events (
    cell_id             VARCHAR,    -- eNB/gNB + cell index
    enb_id              VARCHAR,    -- base station ID
    site_id             VARCHAR,
    technology          VARCHAR,    -- LTE, NR, etc.
    rsrp_dbm            DOUBLE PRECISION,
    rsrq_db             DOUBLE PRECISION,
    prb_util_dl_pct     DOUBLE PRECISION,
    prb_util_ul_pct     DOUBLE PRECISION,
    throughput_dl_mbps  DOUBLE PRECISION,
    throughput_ul_mbps  DOUBLE PRECISION,
    rtt_ms              DOUBLE PRECISION,
    handover_attempts   BIGINT,
    handover_successes  BIGINT,
    rrc_attempts        BIGINT,
    rrc_successes       BIGINT,
    event_time          TIMESTAMPTZ
)
WITH (
    connector      = 'kafka',
    topic          = 'telecom.cell.kpi',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON
( TIMESTAMP COLUMN = event_time );

Step 2: Core View

Aggregate KPIs into 1-minute tumbling windows per cell and compute derived metrics including handover success rate and RRC success rate.

CREATE MATERIALIZED VIEW cell_kpi_1min AS
SELECT
    cell_id,
    enb_id,
    site_id,
    technology,
    window_start,
    window_end,
    AVG(rsrp_dbm)           AS avg_rsrp,
    AVG(rsrq_db)            AS avg_rsrq,
    AVG(prb_util_dl_pct)    AS avg_prb_dl,
    AVG(prb_util_ul_pct)    AS avg_prb_ul,
    AVG(throughput_dl_mbps) AS avg_throughput_dl,
    AVG(throughput_ul_mbps) AS avg_throughput_ul,
    AVG(rtt_ms)             AS avg_rtt_ms,
    SUM(handover_attempts)  AS total_ho_attempts,
    SUM(handover_successes) AS total_ho_successes,
    SUM(rrc_attempts)       AS total_rrc_attempts,
    SUM(rrc_successes)      AS total_rrc_successes,
    CASE
        WHEN SUM(handover_attempts) > 0
        THEN 100.0 * SUM(handover_successes) / SUM(handover_attempts)
        ELSE NULL
    END AS ho_success_rate_pct,
    CASE
        WHEN SUM(rrc_attempts) > 0
        THEN 100.0 * SUM(rrc_successes) / SUM(rrc_attempts)
        ELSE NULL
    END AS rrc_success_rate_pct
FROM TUMBLE(cell_kpi_events, event_time, INTERVAL '1 MINUTE')
GROUP BY cell_id, enb_id, site_id, technology, window_start, window_end;

-- Site-level rollup (aggregate all cells per site)
CREATE MATERIALIZED VIEW site_kpi_5min AS
SELECT
    site_id,
    window_start,
    window_end,
    COUNT(DISTINCT cell_id)         AS active_cells,
    AVG(avg_rsrp)                   AS site_avg_rsrp,
    AVG(avg_prb_dl)                 AS site_avg_prb_dl,
    SUM(avg_throughput_dl)          AS site_total_throughput_dl,
    AVG(avg_rtt_ms)                 AS site_avg_rtt,
    AVG(ho_success_rate_pct)        AS site_ho_success_rate,
    AVG(rrc_success_rate_pct)       AS site_rrc_success_rate
FROM TUMBLE(cell_kpi_1min, window_start, INTERVAL '5 MINUTES')
GROUP BY site_id, window_start, window_end;

Step 3: Alerts and Sinks

Define threshold-based alerts and push them to the NOC ticketing Kafka topic.

CREATE MATERIALIZED VIEW kpi_alerts AS
SELECT
    cell_id,
    site_id,
    technology,
    window_end AS alert_time,
    avg_rsrp,
    avg_rsrq,
    avg_prb_dl,
    ho_success_rate_pct,
    rrc_success_rate_pct,
    avg_rtt_ms,
    CASE
        WHEN avg_rsrp < -110                    THEN 'POOR_COVERAGE'
        WHEN avg_prb_dl > 90                    THEN 'CELL_CONGESTION'
        WHEN ho_success_rate_pct < 90           THEN 'HO_FAILURE'
        WHEN rrc_success_rate_pct < 95          THEN 'RRC_FAILURE'
        WHEN avg_rtt_ms > 100                   THEN 'HIGH_LATENCY'
        ELSE 'KPI_DEGRADATION'
    END AS alert_type,
    CASE
        WHEN avg_rsrp < -115 OR avg_prb_dl > 95 OR ho_success_rate_pct < 80 THEN 'CRITICAL'
        WHEN avg_rsrp < -110 OR avg_prb_dl > 85 OR ho_success_rate_pct < 90 THEN 'WARNING'
        ELSE 'INFO'
    END AS severity
FROM cell_kpi_1min
WHERE avg_rsrp < -110
   OR avg_prb_dl > 85
   OR ho_success_rate_pct < 90
   OR rrc_success_rate_pct < 95
   OR avg_rtt_ms > 100;

CREATE SINK kpi_alerts_sink
FROM kpi_alerts
WITH (
    connector  = 'kafka',
    topic      = 'telecom.noc.alerts',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Comparison Table

Approach	KPI Granularity	Alert Latency	SQL Interface	Scales to 10k Cells
15-min PM file batch	15 min	15–30 min	Via BI tool	Slow
Streaming Flink job	1 min	1–2 min	No	Yes (complex)
Time-series DB + alerting	1 min	1 min	Limited	Medium
RisingWave streaming SQL	Seconds	Seconds	Yes	Yes

FAQ

Q: Can I correlate KPIs from 4G and 5G cells at the same site? Yes. Both technology types feed the same cell_kpi_events source table, distinguished by the technology column. The site-level rollup view aggregates across technologies, and you can add a WHERE technology = 'NR' filter for 5G-specific analysis.

Q: How do I connect Grafana to RisingWave? Use the Grafana PostgreSQL data source plugin. Point it at your RisingWave cluster's SQL port and query materialized views directly. RisingWave's PostgreSQL compatibility means all standard Grafana transformations work without modification.

Q: What should I do when cells report zero counters during maintenance windows? Add a reading_count > 0 filter to the alert view and optionally join a maintenance window table. If a cell is in a planned maintenance window, suppress its alerts by excluding it from the alert sink.

Key Takeaways

RisingWave continuously aggregates cell KPIs into 1-minute and 5-minute views using standard SQL window functions — no stream-processing expertise required.
NOC dashboards built on Grafana or Metabase connect directly via the PostgreSQL protocol for live KPI displays.
Threshold-based alerts flow to ticketing systems within seconds, allowing proactive intervention before subscriber experience degrades.
Site-level rollup views give regional operations managers a single-pane view without additional aggregation infrastructure.

Get started | Slack