Building a Real-Time SLA Monitoring Dashboard

Building a Real-Time SLA Monitoring Dashboard

A real-time SLA monitoring dashboard built with streaming SQL gives SREs and platform engineers continuous visibility into API latencies, error rates, and uptime compliance without polling delays. RisingWave, a PostgreSQL-compatible streaming database, maintains materialized views that update incrementally as each request arrives, so p99 latencies and breach alerts reflect the current state of your services rather than a snapshot from 60 seconds ago.

Traditional SLA monitoring stacks query time-series databases on a schedule: every 15 seconds, every minute. During a fast-moving incident, that polling gap is the difference between catching a breach in its first minute and discovering it after five minutes of customer impact. This guide shows you how to replace that polling loop with continuously maintained SQL views that always reflect live data.

You will build a complete SLA dashboard covering p99 latency tracking, availability monitoring, SLA breach detection, and error budget burn rate calculations. All SQL is verified against RisingWave running locally.

Why Streaming SQL for SLA Monitoring

SLA monitoring has two distinct requirements that pull in opposite directions: you need historical aggregates to compute compliance percentages over rolling windows, and you need real-time freshness so breach alerts fire before your error budget is gone. Batch systems excel at the former; polling-based systems compromise on the latter.

Streaming SQL resolves this tension by maintaining query results incrementally. When a new API request lands in RisingWave, the database updates only the affected rows in your materialized views: the current window aggregate, the running p99, the cumulative availability percentage. No full recomputation, no polling delay.

Traditional Monitoring vs. Streaming SQL for SLAs

DimensionTraditional (Prometheus + Grafana)Streaming SQL (RisingWave)
Alert latency15-60 seconds (scrape interval)Sub-second (event-driven)
p99 computationHistogram approximation (HDR)Exact percentile via PERCENTILE_CONT
Multi-service correlationRequires PromQL joins or recording rulesStandard SQL JOINs across views
Error budget trackingRequires custom recording rulesSingle materialized view with arithmetic
State managementExternal TSDB requiredBuilt-in, no external storage
Query languagePromQL / custom DSLStandard SQL

RisingWave does not replace metrics collection, but it replaces the alerting logic layer. You push your request events to Kafka (or insert directly during development), and RisingWave maintains the aggregates that drive your dashboard and alerts.

Setting Up the Data Model

The foundation of the SLA dashboard is a table that captures each API request with the fields needed for latency and availability computation.

Base Table: API Requests

In production, this table connects to a Kafka source. For local development and testing, CREATE TABLE lets you insert data directly:

CREATE TABLE sla_api_requests (
    request_id VARCHAR,
    service_name VARCHAR,
    endpoint VARCHAR,
    region VARCHAR,
    status_code INT,
    latency_ms DOUBLE PRECISION,
    ts TIMESTAMPTZ
);

To connect to Kafka in production, use CREATE SOURCE instead:

CREATE SOURCE sla_api_requests (
    request_id VARCHAR,
    service_name VARCHAR,
    endpoint VARCHAR,
    region VARCHAR,
    status_code INT,
    latency_ms DOUBLE PRECISION,
    ts TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'api_requests',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

For more on the difference between sources and tables in RisingWave, see the CREATE SOURCE documentation and the CREATE TABLE documentation.

SLA Targets Table

Each service has different SLA targets. Store them in a lookup table so your materialized views can join against them rather than hard-coding thresholds:

CREATE TABLE sla_targets (
    service_name VARCHAR,
    sla_availability_pct DOUBLE PRECISION,
    latency_sla_p99_ms DOUBLE PRECISION
);

INSERT INTO sla_targets VALUES
    ('payments', 99.9,  500.0),
    ('search',   99.9,  200.0),
    ('auth',     99.95, 300.0);

Sample Data

Insert representative data to verify all views. This dataset includes normal-latency requests alongside error responses and latency spikes to exercise the breach detection logic:

INSERT INTO sla_api_requests VALUES
    ('r001', 'payments', '/api/payments', 'us-east', 200,   45.2, '2026-04-01 10:00:05+00'),
    ('r002', 'payments', '/api/payments', 'us-east', 200,   52.1, '2026-04-01 10:00:10+00'),
    ('r003', 'payments', '/api/payments', 'us-east', 200,   48.8, '2026-04-01 10:00:15+00'),
    ('r004', 'payments', '/api/payments', 'us-east', 500, 5210.0, '2026-04-01 10:00:20+00'),
    ('r005', 'payments', '/api/payments', 'us-east', 200,   39.5, '2026-04-01 10:00:25+00'),
    ('r006', 'payments', '/api/payments', 'us-east', 500, 4890.0, '2026-04-01 10:00:30+00'),
    ('r007', 'payments', '/api/payments', 'us-east', 200,   55.3, '2026-04-01 10:00:35+00'),
    ('r008', 'payments', '/api/payments', 'us-east', 200,   41.2, '2026-04-01 10:00:40+00'),
    ('r009', 'payments', '/api/payments', 'us-east', 200,   47.6, '2026-04-01 10:00:45+00'),
    ('r010', 'payments', '/api/payments', 'us-east', 200,   50.1, '2026-04-01 10:00:50+00'),
    ('r011', 'search',   '/api/search',   'us-east', 200,   12.3, '2026-04-01 10:00:03+00'),
    ('r012', 'search',   '/api/search',   'us-east', 200,   15.4, '2026-04-01 10:00:13+00'),
    ('r013', 'search',   '/api/search',   'us-east', 200,    9.8, '2026-04-01 10:00:23+00'),
    ('r014', 'search',   '/api/search',   'us-east', 200,   11.2, '2026-04-01 10:00:33+00'),
    ('r015', 'search',   '/api/search',   'us-east', 200,   13.7, '2026-04-01 10:00:43+00'),
    ('r016', 'auth',     '/api/auth',     'us-east', 200,   88.5, '2026-04-01 10:00:06+00'),
    ('r017', 'auth',     '/api/auth',     'us-east', 200,  320.1, '2026-04-01 10:00:16+00'),
    ('r018', 'auth',     '/api/auth',     'us-east', 429,   92.3, '2026-04-01 10:00:26+00'),
    ('r019', 'auth',     '/api/auth',     'us-east', 200,   95.6, '2026-04-01 10:00:36+00'),
    ('r020', 'auth',     '/api/auth',     'us-east', 200,   88.9, '2026-04-01 10:00:46+00'),
    -- Second minute: payments recovers, auth degrades
    ('r021', 'payments', '/api/payments', 'us-east', 200,   42.1, '2026-04-01 10:01:05+00'),
    ('r022', 'payments', '/api/payments', 'us-east', 200,   49.8, '2026-04-01 10:01:10+00'),
    ('r023', 'payments', '/api/payments', 'us-east', 200,   53.2, '2026-04-01 10:01:15+00'),
    ('r024', 'payments', '/api/payments', 'us-east', 200,   44.7, '2026-04-01 10:01:20+00'),
    ('r025', 'payments', '/api/payments', 'us-east', 200,   51.3, '2026-04-01 10:01:25+00'),
    ('r026', 'payments', '/api/payments', 'us-east', 200,   46.9, '2026-04-01 10:01:30+00'),
    ('r027', 'payments', '/api/payments', 'us-east', 200,   48.2, '2026-04-01 10:01:35+00'),
    ('r028', 'payments', '/api/payments', 'us-east', 200,  450.5, '2026-04-01 10:01:40+00'),
    ('r029', 'payments', '/api/payments', 'us-east', 200,  980.3, '2026-04-01 10:01:45+00'),
    ('r030', 'payments', '/api/payments', 'us-east', 200, 1240.8, '2026-04-01 10:01:50+00'),
    ('r031', 'search',   '/api/search',   'us-east', 200,   10.5, '2026-04-01 10:01:03+00'),
    ('r032', 'search',   '/api/search',   'us-east', 200,   14.2, '2026-04-01 10:01:13+00'),
    ('r033', 'search',   '/api/search',   'us-east', 200,   11.8, '2026-04-01 10:01:23+00'),
    ('r034', 'search',   '/api/search',   'us-east', 200,   16.1, '2026-04-01 10:01:33+00'),
    ('r035', 'search',   '/api/search',   'us-east', 200,   12.9, '2026-04-01 10:01:43+00'),
    ('r036', 'auth',     '/api/auth',     'us-east', 200,   91.2, '2026-04-01 10:01:06+00'),
    ('r037', 'auth',     '/api/auth',     'us-east', 200,   89.4, '2026-04-01 10:01:16+00'),
    ('r038', 'auth',     '/api/auth',     'us-east', 500, 3200.0, '2026-04-01 10:01:26+00'),
    ('r039', 'auth',     '/api/auth',     'us-east', 200,   93.7, '2026-04-01 10:01:36+00'),
    ('r040', 'auth',     '/api/auth',     'us-east', 200,   87.6, '2026-04-01 10:01:46+00');

Computing p99 Latencies with Materialized Views

The p99 latency (the 99th percentile) is the gold standard metric for SLA compliance. It tells you the worst experience your users encounter, excluding the extreme 1% tail. Unlike averages, p99 is not distorted by the majority of fast requests when a small fraction is extremely slow.

RisingWave supports exact percentile computation via PERCENTILE_CONT, which returns the precise value at the specified quantile using linear interpolation. This is more accurate than the histogram approximation used by Prometheus's histogram_quantile.

p99 Latency Materialized View

This view computes p50, p95, and p99 latencies per service and endpoint in 1-minute tumbling windows:

CREATE MATERIALIZED VIEW sla_p99_latency AS
SELECT
    service_name,
    endpoint,
    COUNT(*) AS request_count,
    ROUND(AVG(latency_ms)::numeric, 2) AS avg_latency_ms,
    ROUND(PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY latency_ms)::numeric, 2) AS p50_latency_ms,
    ROUND(PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms)::numeric, 2) AS p95_latency_ms,
    ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms)::numeric, 2) AS p99_latency_ms,
    window_start,
    window_end
FROM TUMBLE(
    sla_api_requests,
    ts,
    INTERVAL '1' MINUTE
)
GROUP BY service_name, endpoint, window_start, window_end;

The TUMBLE function creates fixed non-overlapping windows. Each window closes at its window_end timestamp, at which point the final aggregates are committed. RisingWave updates the in-progress window incrementally as events arrive, so the view always reflects the latest data within the current window.

Query the view:

SELECT
    service_name,
    endpoint,
    request_count,
    avg_latency_ms,
    p50_latency_ms,
    p95_latency_ms,
    p99_latency_ms,
    window_start
FROM sla_p99_latency
ORDER BY service_name, window_start;
 service_name |   endpoint    | request_count | avg_latency_ms | p50_latency_ms | p95_latency_ms | p99_latency_ms |       window_start
--------------+---------------+---------------+----------------+----------------+----------------+----------------+---------------------------
 auth         | /api/auth     |             5 |         137.08 |          92.30 |         275.20 |         311.12 | 2026-04-01 10:00:00+00:00
 auth         | /api/auth     |             5 |         712.38 |          91.20 |        2578.74 |        3075.75 | 2026-04-01 10:01:00+00:00
 payments     | /api/payments |            10 |        1047.98 |          49.45 |        5066.00 |        5181.20 | 2026-04-01 10:00:00+00:00
 payments     | /api/payments |            10 |         300.78 |          50.55 |        1123.58 |        1217.36 | 2026-04-01 10:01:00+00:00
 search       | /api/search   |             5 |          12.48 |          12.30 |          15.06 |          15.33 | 2026-04-01 10:00:00+00:00
 search       | /api/search   |             5 |          13.10 |          12.90 |          15.72 |          16.02 | 2026-04-01 10:01:00+00:00

Notice how the average (avg_latency_ms) for payments in the first window is 1047ms despite most requests completing in under 55ms. Two error responses with 5-second latencies pulled the average up dramatically. The p50 of 49ms tells the real story: half of users are seeing sub-50ms responses. The p99 of 5181ms shows the worst 1% are experiencing over 5 seconds, which is the SLA violation that matters.

The gap between p50 and p99 is diagnostic information. A large gap (as in the payments service) indicates a bimodal latency distribution, typically caused by a subset of slow requests: retries, database locks, or external service timeouts.

Measuring Availability and Error Rates

Availability SLAs are typically expressed as a percentage of requests that succeed within a defined time window. HTTP 5xx responses count as failures; 4xx responses are usually excluded because they represent client errors rather than service failures.

Availability Materialized View

CREATE MATERIALIZED VIEW sla_availability AS
SELECT
    service_name,
    endpoint,
    COUNT(*) AS total_requests,
    COUNT(*) FILTER (WHERE status_code < 500) AS successful_requests,
    COUNT(*) FILTER (WHERE status_code >= 500) AS error_requests,
    ROUND(
        COUNT(*) FILTER (WHERE status_code < 500)::NUMERIC
        / COUNT(*)::NUMERIC * 100,
        4
    ) AS availability_pct,
    ROUND(
        COUNT(*) FILTER (WHERE status_code >= 500)::NUMERIC
        / COUNT(*)::NUMERIC * 100,
        4
    ) AS error_rate_pct,
    window_start,
    window_end
FROM TUMBLE(
    sla_api_requests,
    ts,
    INTERVAL '1' MINUTE
)
GROUP BY service_name, endpoint, window_start, window_end;

The FILTER (WHERE ...) syntax is standard SQL conditional aggregation. It counts only the rows that match the condition, making the intent explicit without a CASE expression.

Query the view:

SELECT
    service_name,
    endpoint,
    total_requests,
    successful_requests,
    error_requests,
    availability_pct,
    error_rate_pct,
    window_start
FROM sla_availability
ORDER BY service_name, window_start;
 service_name |   endpoint    | total_requests | successful_requests | error_requests | availability_pct | error_rate_pct |       window_start
--------------+---------------+----------------+---------------------+----------------+------------------+----------------+---------------------------
 auth         | /api/auth     |              5 |                   5 |              0 |          100.000 |          0.000 | 2026-04-01 10:00:00+00:00
 auth         | /api/auth     |              5 |                   4 |              1 |           80.000 |         20.000 | 2026-04-01 10:01:00+00:00
 payments     | /api/payments |             10 |                   8 |              2 |           80.000 |         20.000 | 2026-04-01 10:00:00+00:00
 payments     | /api/payments |             10 |                  10 |              0 |          100.000 |          0.000 | 2026-04-01 10:01:00+00:00
 search       | /api/search   |              5 |                   5 |              0 |          100.000 |          0.000 | 2026-04-01 10:00:00+00:00
 search       | /api/search   |              5 |                   5 |              0 |          100.000 |          0.000 | 2026-04-01 10:01:00+00:00

Payments had 80% availability in the first minute (2 errors out of 10 requests) and recovered to 100% in the second minute. Auth had 100% availability initially but then hit a single 500 error, dropping to 80%. Search remained healthy throughout.

Notice that the auth service's 429 rate-limit response in the first window (request r018) does not count as an error because 4xx status codes indicate client-side failures, not service failures.

Detecting SLA Breaches

With separate views for latency and availability, you can now join them against the sla_targets table to identify breaches in real time. This breach detection view is where streaming SQL shines: rather than polling every minute and running a comparison query, RisingWave updates the breach status incrementally as each new request arrives.

SLA Breach Alerts Materialized View

CREATE MATERIALIZED VIEW sla_breach_alerts AS
SELECT
    p.service_name,
    p.endpoint,
    p.p99_latency_ms,
    p.avg_latency_ms,
    a.availability_pct,
    a.error_rate_pct,
    a.total_requests,
    CASE
        WHEN p.p99_latency_ms > 500 AND a.availability_pct < 99.9
            THEN 'CRITICAL'
        WHEN p.p99_latency_ms > 500
            THEN 'WARNING'
        WHEN a.availability_pct < 99.9
            THEN 'WARNING'
        ELSE 'OK'
    END AS breach_severity,
    CASE
        WHEN p.p99_latency_ms > 500 AND a.availability_pct < 99.9
            THEN 'Latency SLA breach (p99=' || ROUND(p.p99_latency_ms::numeric, 0)
                 || 'ms) AND availability breach (' || a.availability_pct || '%)'
        WHEN p.p99_latency_ms > 500
            THEN 'Latency SLA breach: p99=' || ROUND(p.p99_latency_ms::numeric, 0)
                 || 'ms exceeds 500ms target'
        WHEN a.availability_pct < 99.9
            THEN 'Availability SLA breach: ' || a.availability_pct
                 || '% below 99.9% target'
        ELSE 'Within SLA'
    END AS breach_reason,
    p.window_start,
    p.window_end
FROM sla_p99_latency p
JOIN sla_availability a
    ON p.service_name = a.service_name
    AND p.endpoint = a.endpoint
    AND p.window_start = a.window_start
WHERE p.p99_latency_ms > 500 OR a.availability_pct < 99.9;

The WHERE clause at the end filters to only the windows that are actually breaching. This keeps the view compact and makes it easy to use as an alert source: if a row exists, there is a breach.

Query the view:

SELECT
    service_name,
    endpoint,
    p99_latency_ms,
    availability_pct,
    breach_severity,
    breach_reason,
    window_start
FROM sla_breach_alerts
ORDER BY breach_severity, window_start;
 service_name |   endpoint    | p99_latency_ms | availability_pct | breach_severity |                          breach_reason                           |       window_start
--------------+---------------+----------------+------------------+-----------------+------------------------------------------------------------------+---------------------------
 payments     | /api/payments |        5181.20 |            80.00 | CRITICAL        | Latency SLA breach (p99=5181ms) AND availability breach (80.00%) | 2026-04-01 10:00:00+00:00
 auth         | /api/auth     |        3075.75 |            80.00 | CRITICAL        | Latency SLA breach (p99=3076ms) AND availability breach (80.00%) | 2026-04-01 10:01:00+00:00
 payments     | /api/payments |        1217.36 |           100.00 | WARNING         | Latency SLA breach: p99=1217ms exceeds 500ms target              | 2026-04-01 10:01:00+00:00

Three breaches detected. Payments is CRITICAL in the first minute because both error rate and latency are out of bounds simultaneously. In the second minute, payments recovers its availability (no 500 errors) but the latency tail is still elevated by the spike requests, so it stays in WARNING. Auth deteriorates to CRITICAL in the second minute when a single 500 error appears combined with a 3-second response.

This layered severity model (CRITICAL when both conditions breach simultaneously, WARNING for single-dimension breaches) helps on-call engineers triage quickly. A WARNING on latency alone might be tolerable; CRITICAL means customers are both seeing errors and experiencing slow responses right now.

Calculating Error Budget Burn Rates

Error budgets formalize the relationship between reliability and engineering velocity. A 99.9% availability SLA gives you 43.8 minutes of allowed downtime per 30-day month. The burn rate tells you how fast you are consuming that budget: a burn rate of 1.0 means you are exactly on pace, a burn rate of 10 means you will exhaust your monthly budget in 3 days.

The Google SRE book defines two alerting thresholds for burn rate: alert at burn rate 14.4 (which would exhaust the budget in 2 days if sustained) and at burn rate 6 (which would exhaust it in 5 days).

Error Budget Burn Rate Materialized View

CREATE MATERIALIZED VIEW sla_error_budget AS
SELECT
    a.service_name,
    a.endpoint,
    a.total_requests,
    a.availability_pct AS current_availability_pct,
    t.sla_availability_pct AS target_availability_pct,
    (100.0 - t.sla_availability_pct) AS monthly_error_budget_pct,
    ROUND(
        ((100.0 - a.availability_pct) / NULLIF(100.0 - t.sla_availability_pct, 0))::numeric,
        2
    ) AS burn_rate,
    CASE
        WHEN (100.0 - a.availability_pct) / NULLIF(100.0 - t.sla_availability_pct, 0) > 14.4
            THEN 'CRITICAL: 1-hour burn rate will exhaust monthly budget'
        WHEN (100.0 - a.availability_pct) / NULLIF(100.0 - t.sla_availability_pct, 0) > 6
            THEN 'HIGH: Burning budget 6x faster than target'
        WHEN (100.0 - a.availability_pct) / NULLIF(100.0 - t.sla_availability_pct, 0) > 1
            THEN 'ELEVATED: Burning budget faster than sustainable'
        ELSE 'HEALTHY: Within error budget'
    END AS budget_status,
    a.window_start,
    a.window_end
FROM sla_availability a
JOIN sla_targets t ON a.service_name = t.service_name;

The NULLIF(100.0 - t.sla_availability_pct, 0) guard prevents division by zero for a 100% SLA target. The burn rate formula is straightforward: actual error fraction divided by allowed error fraction. If your SLA allows 0.1% errors and your current window has 20% errors, your burn rate is 200.

Query the view sorted by burn rate:

SELECT
    service_name,
    endpoint,
    total_requests,
    current_availability_pct,
    target_availability_pct,
    burn_rate,
    budget_status,
    window_start
FROM sla_error_budget
ORDER BY burn_rate DESC NULLS LAST;
 service_name |   endpoint    | total_requests | current_availability_pct | target_availability_pct | burn_rate |                     budget_status                      |       window_start
--------------+---------------+----------------+--------------------------+-------------------------+-----------+--------------------------------------------------------+---------------------------
 auth         | /api/auth     |              5 |                    80.00 |                   99.95 |    400.00 | CRITICAL: 1-hour burn rate will exhaust monthly budget | 2026-04-01 10:01:00+00:00
 payments     | /api/payments |             10 |                    80.00 |                    99.9 |    200.00 | CRITICAL: 1-hour burn rate will exhaust monthly budget | 2026-04-01 10:00:00+00:00
 payments     | /api/payments |             10 |                   100.00 |                    99.9 |      0.00 | HEALTHY: Within error budget                           | 2026-04-01 10:01:00+00:00
 auth         | /api/auth     |              5 |                   100.00 |                   99.95 |      0.00 | HEALTHY: Within error budget                           | 2026-04-01 10:00:00+00:00
 search       | /api/search   |              5 |                   100.00 |                    99.9 |      0.00 | HEALTHY: Within error budget                           | 2026-04-01 10:00:00+00:00
 search       | /api/search   |              5 |                   100.00 |                    99.9 |      0.00 | HEALTHY: Within error budget                           | 2026-04-01 10:01:00+00:00

Auth has a burn rate of 400 during its bad minute. That means it is consuming its monthly error budget 400 times faster than sustainable. If that rate persisted, the entire monthly budget would be gone in under 2 hours. The payments service in its first bad minute shows a burn rate of 200. Both services recover to 0 burn rate in their healthy windows.

The auth service has a tighter SLA (99.95% vs 99.9% for payments), which is why the same 20% error rate produces a higher burn rate: the allowed budget fraction is smaller.

Building the Unified SLA Dashboard View

The final layer combines latency, availability, and SLA target data into a single view that maps directly to dashboard panels. This is the view you point Grafana or any other visualization tool at.

CREATE MATERIALIZED VIEW sla_dashboard AS
SELECT
    p.service_name,
    p.endpoint,
    p.request_count,
    p.avg_latency_ms,
    p.p50_latency_ms,
    p.p95_latency_ms,
    p.p99_latency_ms,
    a.availability_pct,
    a.error_rate_pct,
    t.sla_availability_pct AS sla_target_pct,
    t.latency_sla_p99_ms AS sla_p99_target_ms,
    CASE
        WHEN p.p99_latency_ms <= t.latency_sla_p99_ms
             AND a.availability_pct >= t.sla_availability_pct
            THEN 'COMPLIANT'
        ELSE 'BREACHED'
    END AS sla_status,
    p.window_start,
    p.window_end
FROM sla_p99_latency p
JOIN sla_availability a
    ON p.service_name = a.service_name
    AND p.endpoint = a.endpoint
    AND p.window_start = a.window_start
JOIN sla_targets t ON p.service_name = t.service_name;

Query the unified dashboard:

SELECT
    service_name,
    endpoint,
    request_count,
    p99_latency_ms,
    sla_p99_target_ms,
    availability_pct,
    sla_target_pct,
    sla_status,
    window_start
FROM sla_dashboard
ORDER BY sla_status, service_name, window_start;
 service_name |   endpoint    | request_count | p99_latency_ms | sla_p99_target_ms | availability_pct | sla_target_pct | sla_status |       window_start
--------------+---------------+---------------+----------------+-------------------+------------------+----------------+------------+---------------------------
 auth         | /api/auth     |             5 |         311.12 |               300 |           100.00 |          99.95 | BREACHED   | 2026-04-01 10:00:00+00:00
 auth         | /api/auth     |             5 |        3075.75 |               300 |            80.00 |          99.95 | BREACHED   | 2026-04-01 10:01:00+00:00
 payments     | /api/payments |            10 |        5181.20 |               500 |            80.00 |           99.9 | BREACHED   | 2026-04-01 10:00:00+00:00
 payments     | /api/payments |            10 |        1217.36 |               500 |           100.00 |           99.9 | BREACHED   | 2026-04-01 10:01:00+00:00
 search       | /api/search   |             5 |          15.33 |               200 |           100.00 |           99.9 | COMPLIANT  | 2026-04-01 10:00:00+00:00
 search       | /api/search   |             5 |          16.02 |               200 |           100.00 |           99.9 | COMPLIANT  | 2026-04-01 10:01:00+00:00

Search is the only compliant service. The auth service breached its 300ms p99 target even in the first minute when it had 100% availability, because request r017 took 320ms and pulled the p99 up to 311ms. This is exactly why p99 tracking matters: the service looked "up" by availability metrics but was actually failing its latency SLA.

Connecting to Grafana

Since RisingWave is PostgreSQL-compatible, Grafana connects to it using the standard PostgreSQL data source plugin. Point the data source at localhost:4566 (or your RisingWave host), with database dev and user root.

Each dashboard panel queries one of the materialized views:

  • SLA Status panel: SELECT service_name, sla_status FROM sla_dashboard WHERE window_start = (SELECT MAX(window_start) FROM sla_dashboard) gives the current compliance status per service
  • p99 Latency time series: Query sla_p99_latency with window_start as the time column and p99_latency_ms as the value
  • Burn Rate gauge: Query sla_error_budget for the latest window, display burn_rate with color thresholds at 1, 6, and 14.4

Because all these views are already precomputed and maintained incrementally by RisingWave, Grafana queries return in milliseconds even as the underlying data volume grows. The database does the heavy lifting at ingestion time, not at query time.

For a walkthrough of connecting RisingWave to Grafana with time-series panels, see Building Real-Time Dashboards from Kafka Data with SQL.

Sinking Breach Alerts to Kafka

To trigger PagerDuty pages or Slack messages when a breach occurs, sink the sla_breach_alerts view to a Kafka topic. Lightweight downstream consumers then route alerts to your notification channels.

CREATE SINK sla_alerts_sink FROM sla_breach_alerts
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'sla-breach-alerts'
)
FORMAT PLAIN ENCODE JSON;

Each row emitted to the topic is a JSON object containing the service name, endpoint, breach severity, breach reason, and window timestamps. A stateless consumer reads from this topic and calls your alerting API. No custom stream processing logic, no state management, just a Kafka consumer and an HTTP client.

For a complete guide to building alert pipelines with RisingWave sinks, see Building a Real-Time Alerting System on Streaming Data.

Windowing Strategy: Tumbling vs. Hopping

This tutorial uses 1-minute tumbling windows for all views. Tumbling windows are non-overlapping and contiguous: each request falls into exactly one window, and each window closes cleanly at its end time. This makes availability percentages straightforward to reason about.

For SLA compliance that your contracts reference monthly or weekly, you may want hopping windows instead. A hopping window slides forward on a step interval shorter than its size, giving you a rolling view:

-- p99 over the last 5 minutes, updated every minute
CREATE MATERIALIZED VIEW sla_p99_rolling_5m AS
SELECT
    service_name,
    endpoint,
    COUNT(*) AS request_count,
    ROUND(PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms)::numeric, 2) AS p99_latency_ms,
    window_start,
    window_end
FROM HOP(
    sla_api_requests,
    ts,
    INTERVAL '1' MINUTE,   -- slide interval
    INTERVAL '5' MINUTE    -- window size
)
GROUP BY service_name, endpoint, window_start, window_end;

The HOP function creates a window of 5 minutes that slides every 1 minute. At any point in time, each request contributes to 5 windows simultaneously. This gives you a smoother latency trend line on your dashboard and reduces alert flapping from momentary spikes.

For a deep dive on window function options in RisingWave, see Windowing in Stream Processing: Tumbling, Hopping, and Session Windows.

Architecture Summary

graph TD
    A[API Services] -->|request events| B[Kafka: api_requests]
    B -->|CREATE SOURCE| C[RisingWave]
    C --> D[sla_p99_latency MV]
    C --> E[sla_availability MV]
    F[sla_targets table] --> G[sla_breach_alerts MV]
    F --> H[sla_error_budget MV]
    F --> I[sla_dashboard MV]
    D --> G
    D --> I
    E --> G
    E --> H
    E --> I
    G -->|CREATE SINK| J[Kafka: sla-breach-alerts]
    J --> K[PagerDuty / Slack consumer]
    I -->|PostgreSQL wire protocol| L[Grafana Dashboard]

The architecture has four layers. Events flow from API services into Kafka. RisingWave reads from Kafka and maintains five materialized views. The breach alerts view fans out to a Kafka sink for alerting consumers. The dashboard view connects to Grafana for visualization. All the complexity lives in the materialized view definitions, which are standard SQL.

Frequently Asked Questions

How is p99 latency computed in RisingWave, and is it exact?

RisingWave computes p99 latency using the PERCENTILE_CONT ordered-set aggregate function, which produces an exact result via linear interpolation across all values in the window. This is different from Prometheus's histogram_quantile, which approximates the percentile from pre-bucketed histogram data. The tradeoff is memory: exact percentiles require storing all values in the window, while histogram approximations use fixed-size buckets. For most SLA monitoring use cases with minute-level windows, the memory cost is manageable and the accuracy benefit is significant, since approximation errors can mask whether you are actually meeting your p99 SLA.

What is an error budget burn rate, and why does it matter for SLAs?

An error budget is the maximum amount of unreliability your SLA allows. For a 99.9% availability SLA, the error budget is 0.1% of requests per month. The burn rate measures how fast you are consuming that budget relative to the sustainable pace. A burn rate of 1.0 means you will exactly exhaust your budget at the end of the month. A burn rate of 14.4 means you will exhaust it in 2 days if the current failure rate continues. Burn rate alerting is more actionable than point-in-time availability percentages because it captures the trajectory of your reliability, not just the current snapshot.

Can I use RisingWave alongside Prometheus rather than replacing it?

Yes. A common architecture uses Prometheus for metrics collection from infrastructure (CPU, memory, disk) and RisingWave for application-level SLA tracking from request events. You can expose RisingWave's materialized view results to Prometheus via a custom exporter if you need to unify dashboards in Grafana Mimir or Thanos. Alternatively, use Grafana's multi-datasource support to display Prometheus metrics and RisingWave SLA views side by side in the same dashboard. The two systems are complementary: Prometheus is optimized for pull-based metrics scraping, while RisingWave is optimized for continuous SQL aggregations over event streams.

How do I handle late-arriving events in SLA calculations?

By default, RisingWave's tumbling and hopping windows close when the watermark advances past the window end time. Events that arrive after the window closes are dropped from that window's aggregation. For SLA dashboards where you need to handle occasional late events (from retries, network partitions, or batch uploads), you can configure the watermark tolerance using the WATERMARK clause on your source. Setting a 30-second watermark tolerance allows events up to 30 seconds late to be included in the correct window, at the cost of delaying window closure by 30 seconds. The right tolerance depends on your data pipeline's end-to-end latency characteristics.

Conclusion

Building a real-time SLA monitoring dashboard with streaming SQL replaces the polling loop at the heart of traditional monitoring stacks with continuously maintained materialized views. The key takeaways:

  • p99 latency is computed exactly using PERCENTILE_CONT, giving you precise SLA compliance tracking without histogram approximation errors
  • Availability views use conditional aggregation with FILTER (WHERE status_code >= 500) to cleanly separate service errors from client errors
  • Breach alerts join latency and availability views with SLA target tables, so thresholds are managed in data rather than hard-coded in SQL
  • Burn rate calculations translate raw error percentages into actionable rate-of-budget-consumption metrics that map directly to Google SRE burn rate alerting thresholds
  • The unified dashboard view provides a single query surface for visualization tools like Grafana, with all aggregations precomputed at ingestion time

RisingWave handles the incremental maintenance of all these views. Each incoming API request updates the relevant window aggregates in sub-second time, so your SLA dashboard reflects the current state of your services rather than a snapshot from the last polling interval.


Ready to build your own SLA dashboard? Try RisingWave Cloud free, no credit card required. Join the RisingWave Slack community to ask questions and connect with other SREs and platform engineers building on streaming SQL.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.