Multi-Tenant SaaS Real-Time Analytics Architecture with Streaming SQL

Multi-Tenant SaaS Real-Time Analytics Architecture with Streaming SQL

Every SaaS product reaches the point where customers ask for an in-app analytics dashboard. The product team wants it fast and live. The platform team is the one who has to make it work across thousands of tenants without one heavy customer ruining the dashboard latency for everyone else.

This post covers a workable multi-tenant streaming analytics architecture built on RisingWave, a PostgreSQL-compatible streaming database. The pieces are familiar: source with a tenant_id column, per-tenant aggregations as materialized views, tenant-scoped reads from the application, and an explicit strategy for the noisy-neighbor problem. All SQL is verified against RisingWave 2.8.0.

The Multi-Tenant Streaming Analytics Problem

The multi-tenant streaming analytics problem is delivering live per-customer dashboards without three failure modes: cross-tenant data leaks, latency that degrades as the customer base grows, and one heavy tenant starving everyone else. A streaming database solves the first two with tenant_id carried through every materialized view and a query pattern that filters early. The third one needs explicit architectural choices, not just SQL.

Most teams start by stitching together an OLTP database, a batch pipeline, and a BI iframe. That works for the first hundred customers. Around the time tenant counts move into the thousands, three things start to hurt: nightly rollups make "real time" feel like yesterday's news, per-tenant query plans get unpredictable as data shape diverges, and one tenant with 100x the event volume drives the cluster harder than the other 999 combined.

ConcernOLTP + batch warehouseStreaming SQL (RisingWave)
Dashboard freshnessMinutes to hours behindSub-second
Per-tenant aggregationsRecomputed nightlyMaintained incrementally
Tenant isolation at the SQL layerApplication filterSame: WHERE tenant_id = ?
Adding a new metricNew ETL + schedulerCREATE MATERIALIZED VIEW
Noisy-neighbor blast radiusSame instanceShared-nothing compute nodes

Streaming SQL does not magically make the noisy-neighbor problem disappear. It changes where you fight it, from query planner heuristics to capacity placement and isolation policy.

The Event Stream

A workable multi-tenant event stream is one row per user action with tenant_id as a first-class column on every event. For verification we use a regular table; in production this is a CREATE SOURCE over Kafka, with the same shape.

CREATE TABLE saas_events (
    tenant_id    VARCHAR,
    user_id      VARCHAR,
    event_type   VARCHAR,
    object_id    VARCHAR,
    duration_ms  INT,
    event_time   TIMESTAMPTZ
);

Seed three tenants with different load profiles. acme has steady mixed traffic, globex is light, and initech is the heavy hitter doing nothing but sync calls.

INSERT INTO saas_events VALUES
  ('acme','alice@acme','page_view','dashboard',     85, NOW() - INTERVAL '50 minutes'),
  ('acme','alice@acme','api_call','/v1/orders',    140, NOW() - INTERVAL '49 minutes'),
  ('acme','alice@acme','api_call','/v1/orders',    155, NOW() - INTERVAL '48 minutes'),
  ('acme','bob@acme',  'page_view','reports',       72, NOW() - INTERVAL '40 minutes'),
  ('acme','bob@acme',  'api_call','/v1/exports',   320, NOW() - INTERVAL '38 minutes'),
  ('acme','alice@acme','api_call','/v1/orders',    138, NOW() - INTERVAL '20 minutes'),
  ('acme','carol@acme','page_view','dashboard',     90, NOW() - INTERVAL '10 minutes'),
  ('acme','carol@acme','api_call','/v1/orders',    132, NOW() - INTERVAL '8 minutes'),
  ('acme','alice@acme','api_call','/v1/orders',    142, NOW() - INTERVAL '5 minutes'),
  ('globex','dave@globex', 'page_view','dashboard',  88, NOW() - INTERVAL '45 minutes'),
  ('globex','dave@globex', 'api_call','/v1/users',  120, NOW() - INTERVAL '42 minutes'),
  ('globex','eve@globex',  'api_call','/v1/users',  115, NOW() - INTERVAL '30 minutes'),
  ('globex','dave@globex', 'api_call','/v1/users',  118, NOW() - INTERVAL '4 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 480, NOW() - INTERVAL '50 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 510, NOW() - INTERVAL '49 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 495, NOW() - INTERVAL '48 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 520, NOW() - INTERVAL '47 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 502, NOW() - INTERVAL '46 minutes'),
  ('initech','frank@initech','api_call','/v1/sync', 540, NOW() - INTERVAL '45 minutes'),
  ('initech','grace@initech','api_call','/v1/sync', 470, NOW() - INTERVAL '10 minutes'),
  ('initech','grace@initech','api_call','/v1/sync', 488, NOW() - INTERVAL '9 minutes'),
  ('initech','grace@initech','api_call','/v1/sync', 495, NOW() - INTERVAL '8 minutes'),
  ('initech','grace@initech','api_call','/v1/sync', 511, NOW() - INTERVAL '7 minutes');

Two design choices worth flagging:

  • tenant_id is the leading column conceptually but not necessarily a partition key in the source. The materialized views below group by it, which is what enables tenant-scoped reads later.
  • One row per event. Resist the urge to pre-aggregate at the producer. Pre-aggregated rows lock the dashboard into a fixed grain.

Materialized View 1: Per-Tenant Lifetime Usage

The per-tenant lifetime usage view is the dashboard's bird's-eye number. It answers "how active is this tenant, total?" Every column groups by tenant_id, which is the entire point. The query plan against this view, with WHERE tenant_id = 'acme', hits exactly one row.

CREATE MATERIALIZED VIEW saas_tenant_usage AS
SELECT
    tenant_id,
    COUNT(*)                                          AS total_events,
    COUNT(DISTINCT user_id)                           AS active_users,
    COUNT(*) FILTER (WHERE event_type = 'api_call')   AS api_calls,
    COUNT(*) FILTER (WHERE event_type = 'page_view')  AS page_views,
    ROUND(AVG(duration_ms)::numeric, 1)               AS avg_duration_ms,
    MAX(event_time)                                   AS last_seen
FROM saas_events
GROUP BY tenant_id;

Query a single tenant:

SELECT * FROM saas_tenant_usage WHERE tenant_id = 'acme';

Or query all tenants for an internal ops dashboard:

SELECT * FROM saas_tenant_usage ORDER BY tenant_id;

Output:

 tenant_id | total_events | active_users | api_calls | page_views | avg_duration_ms |           last_seen
-----------+--------------+--------------+-----------+------------+-----------------+-------------------------------
 acme      |            9 |            3 |         6 |          3 |           141.6 | 2026-05-22 21:14:34.220+00:00
 globex    |            4 |            2 |         3 |          1 |           110.3 | 2026-05-22 21:15:34.220+00:00
 initech   |           10 |            2 |        10 |          0 |           501.1 | 2026-05-22 21:12:34.220+00:00

The shape of the data per tenant is already visible. initech is doing 5x the per-call latency of the others, which is going to matter for the noisy-neighbor section below.

Materialized View 2: Per-Tenant Rolling 5-Minute Activity

The per-tenant 5-minute activity view powers the in-app sparkline. Each tenant gets one row per 5-minute window, with event counts, unique users, average duration, max duration, and a count of slow calls. The window function is TUMBLE because the dashboard typically wants distinct bars rather than overlapping ones.

CREATE MATERIALIZED VIEW saas_tenant_activity_5m AS
SELECT
    tenant_id,
    window_start,
    window_end,
    COUNT(*)                                          AS events,
    COUNT(DISTINCT user_id)                           AS active_users,
    ROUND(AVG(duration_ms)::numeric, 1)               AS avg_duration_ms,
    MAX(duration_ms)                                  AS p_max_duration_ms,
    COUNT(*) FILTER (WHERE duration_ms > 400)         AS slow_call_count
FROM TUMBLE(saas_events, event_time, INTERVAL '5 minutes')
GROUP BY tenant_id, window_start, window_end;

The query the application sends, parameterized on tenant:

SELECT events, active_users, avg_duration_ms, slow_call_count, window_end
FROM saas_tenant_activity_5m
WHERE tenant_id = 'acme'
ORDER BY window_end DESC
LIMIT 12;

You can verify the filter pushes down with EXPLAIN:

BatchTopN { order: [window_end DESC], limit: 5, offset: 0 }
 └─BatchExchange { order: [], dist: Single }
   └─BatchTopN { order: [window_end DESC], limit: 5, offset: 0 }
     └─BatchScan { table: saas_tenant_activity_5m,
                   columns: [tenant_id, window_start, window_end, events, ...],
                   scan_ranges: [tenant_id = Utf8("acme")] }

The scan_ranges: [tenant_id = Utf8("acme")] line is what you want to see. The scan is bounded to one tenant's rows, not the entire view. That is the whole secret of multi-tenant scaling on a materialized view: keep tenant_id as the leading grouping column and the planner restricts the scan automatically.

Materialized View 3: Top Endpoints Per Tenant

The top-endpoints view is the per-tenant API performance breakdown. Same pattern: tenant_id is in the GROUP BY, the application filters on it for reads, and one heavy tenant's rows do not slow down another tenant's query.

CREATE MATERIALIZED VIEW saas_slow_endpoints_per_tenant AS
SELECT
    tenant_id,
    object_id AS endpoint,
    COUNT(*)                                  AS calls,
    ROUND(AVG(duration_ms)::numeric, 1)       AS avg_duration_ms,
    MAX(duration_ms)                          AS max_duration_ms,
    COUNT(*) FILTER (WHERE duration_ms > 400) AS slow_calls
FROM saas_events
WHERE event_type = 'api_call'
GROUP BY tenant_id, object_id;

Query for one tenant, sorted by traffic:

SELECT endpoint, calls, avg_duration_ms, slow_calls
FROM saas_slow_endpoints_per_tenant
WHERE tenant_id = 'initech'
ORDER BY calls DESC;

Output:

 endpoint  | calls | avg_duration_ms | slow_calls
-----------+-------+-----------------+------------
 /v1/sync  |    10 |           501.1 |         10

initech's sync calls are uniformly slow. The acme query against the same view returns its own slice cleanly, with no cross-tenant interference.

The Read Pattern from the Application

The application connects to RisingWave the same way it would connect to Postgres, over port 4566 with the wire protocol. The dashboard's queries are parameterized by tenant:

SELECT events, active_users, avg_duration_ms, window_end
FROM saas_tenant_activity_5m
WHERE tenant_id = $1
ORDER BY window_end DESC
LIMIT 12;

Three rules keep the read path safe:

  • Always filter by tenant in SQL. Do not rely on the application to filter the result set client-side. The query itself should contain WHERE tenant_id = $1. This is the SQL-layer guard against cross-tenant leaks.
  • Bind tenant_id from the auth token. The tenant the application binds into $1 should come from the user's session, not from a URL parameter or request body. This closes the IDOR (insecure direct object reference) class of bug at the boundary.
  • Use a read-only role with row-level security if you want belt and braces. RisingWave supports standard Postgres-style roles and grants. A read-only role scoped to the analytics views is the safe default.

The Noisy-Neighbor Problem

The noisy-neighbor problem in streaming analytics is when one tenant's event volume or query pattern consumes enough resources to degrade dashboard latency for every other tenant. The mitigations break into three layers: load-side, compute-side, and operational. None of them is solved by the streaming SQL engine alone; together they make the problem tractable.

Load-side controls. Rate-limit ingestion per tenant at the producer or at Kafka. If initech suddenly tries to push 10x its baseline event volume, the streaming engine should not be the first line of defense. Tenant-aware quotas at the producer keep the input bounded.

Compute-side isolation. RisingWave's architecture separates compute nodes from storage. You can run the materialized views for tier-1 tenants on a dedicated compute group, sharing only the storage layer with the rest. This is similar to what data warehouses call workload groups or resource queues. The trade-off is operational: more compute groups mean more capacity to plan and more configuration to keep tidy.

Operational guardrails. Materialize a tenant-level monitoring view of event rates and query QPS. When a tenant's per-minute event count exceeds a threshold, page the on-call engineer instead of letting the cluster absorb the spike. This works because the same SQL that powers in-app dashboards can power internal capacity dashboards.

These three layers are how every large multi-tenant data system handles the problem. The streaming database makes the queries fast and the rules expressible in SQL. It does not remove the need to draw the boundaries.

Sinking Per-Tenant Metrics Downstream

The materialized views above are also useful outside the in-app dashboard. Sink them to Kafka for billing, alerting, or warehouse load:

CREATE SINK tenant_usage_sink FROM saas_tenant_usage
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'analytics.tenant_usage'
) FORMAT PLAIN ENCODE JSON (force_append_only = 'false');

For long-term storage, sink to Apache Iceberg using RisingWave's Iceberg sink. The combination of incrementally-updated materialized views and an Iceberg sink replaces a lot of nightly ETL with one streaming pipeline.

Architecture Summary

The reference shape for multi-tenant streaming analytics on RisingWave:

  • One event source with tenant_id as a top-level column on every row.
  • Materialized views that always GROUP BY tenant_id first.
  • Application queries that always WHERE tenant_id = $1, with the value bound from the session.
  • Per-tenant ingestion quotas at the producer and per-tenant compute groups for top-tier customers.
  • Sinks to Kafka and Iceberg for downstream consumers.

The wire protocol stays Postgres. The rule language stays SQL. Adding a metric is CREATE MATERIALIZED VIEW, not a deployment.

Key Takeaways

A multi-tenant SaaS analytics dashboard does not need a custom stream processor or a per-tenant database. The pattern is simple and verifiable:

  • Carry tenant_id through every event and every aggregation.
  • Use materialized views to maintain per-tenant rollups incrementally.
  • Filter by tenant_id in every read so the scan is bounded to one tenant's rows.
  • Plan compute isolation explicitly. The streaming engine handles the SQL; you handle the boundary.

For teams already on a Postgres-compatible analytics stack, the migration path is straightforward: pipe events into RisingWave, define the same GROUP BY tenant_id aggregations as materialized views, and point the dashboard at port 4566. The SQL you would have written against a warehouse is the SQL you write here. The difference is that it stays fresh.

Try the pipeline yourself with the RisingWave quickstart. The seed data and views above run on a local single-node cluster in under a minute.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.