The Two-Tier Aggregation Pattern for Cheap Rolling Windows in RisingWave

By Terry

Building rolling window metrics over raw event streams looks straightforward until you run the numbers. A 90-day API call history for ten thousand customers, where each customer fires thousands of calls per day, means hundreds of millions of rows in streaming state. That is not a scaling problem you want to discover when your churn model stops refreshing.

There is a better way. This post covers the two-tier aggregation pattern: pre-aggregate raw events into daily buckets (the "tile"), then compute your 7d, 28d, and 90d windows by summing those tiles. The approach is well known in batch data warehouses, but applying it deliberately in a streaming database like RisingWave gives you continuously fresh rolling metrics at a fraction of the state cost.

The example throughout this post uses a SaaS platform tracking API calls, feature engagement, and session activity per customer -- the kind of data that feeds churn prediction models and customer health scores.

Why Direct Rolling Windows on Raw Events Are Expensive

The intuitive approach to a 90-day rolling API call count looks like this:

-- Naive: 90-day rolling window directly on raw events
CREATE MATERIALIZED VIEW customer_api_calls_90d AS
SELECT
  customer_id,
  COUNT(*) AS api_call_count_90d
FROM api_events
WHERE event_ts >= NOW() - INTERVAL '90 days'
GROUP BY customer_id;

The problem is in how RisingWave, or any incremental view maintenance system, keeps this fresh. To answer "what is the rolling 90-day call count for customer X?", the engine must retain all raw events from the past 90 days in state. Every new event shifts the window boundary, potentially expiring old events and adding new ones.

For a platform with ten thousand customers averaging two thousand API calls per day, that is roughly 1.8 billion rows in state for a 90-day window. More critically, the rolling boundary NOW() - INTERVAL '90 days' advances with real time, which means the engine continuously scans and evicts stale rows. Every event triggers a full recomputation of that customer's window.

Now consider that you also need 28-day and 7-day windows for the same metric. Creating separate materialized views for each multiplies your state size by three and runs three separate maintenance processes over the same raw events.

State complexity is O(customers x raw events per customer x window length). That grows without a natural ceiling.

The Two-Tier Solution

The fix is to introduce a stable intermediate layer: the daily aggregate, or "tile."

Tier 1 aggregates raw API events into one row per customer per day using a simple GROUP BY with DATE_TRUNC. This is a tumbling window at a one-day grain. Once a calendar day is complete, that tile is stable. Each tile compresses thousands of individual events into a single row: call count, features used, total session minutes.

Tier 2 builds 7d, 28d, and 90d rolling windows by summing the daily tiles instead of the raw events. Each customer has at most 90 daily rows in the 90-day lookback, regardless of how many individual API calls they made.

State complexity drops to O(customers x 90 days). For ten thousand customers, that is at most 900,000 rows -- and in practice far fewer, since most customers do not have 90 consecutive active days.

Here is the data flow:

flowchart TD
    A[api_events\nraw event stream] --> B[Tier 1: customer_usage_1d\nDATE_TRUNC GROUP BY\none row per customer per day]
    B --> C[Tier 2: customer_engagement_rolling\nSUM CASE WHEN over daily tiles\n7d / 28d / 90d windows]
    C --> D[Application queries\nchurn model / health score API]

    style A fill:#f5f5f5,stroke:#999
    style B fill:#dbeafe,stroke:#3b82f6
    style C fill:#dcfce7,stroke:#16a34a
    style D fill:#f5f5f5,stroke:#999

Building the Two Tiers in SQL

Tier 1: Daily aggregates per customer

The first materialized view groups raw API events by customer and day. RisingWave maintains this view incrementally: each new event updates exactly one daily bucket for that customer.

CREATE MATERIALIZED VIEW customer_usage_1d AS
SELECT
  customer_id,
  DATE_TRUNC('day', event_ts) AS usage_date,
  COUNT(*) AS api_call_count,
  COUNT(DISTINCT feature_name) AS features_used,
  SUM(session_duration_sec) / 60.0 AS session_minutes
FROM api_events_enriched
GROUP BY customer_id, DATE_TRUNC('day', event_ts);

DATE_TRUNC('day', event_ts) is used as the grouping key, forming a tumbling window at a one-day grain. Each output row is bounded: once the calendar day closes, no new events will update that bucket (barring late arrivals -- addressed in the FAQ below).

A sample of what this view looks like at query time:

 customer_id |      usage_date       | api_call_count | features_used | session_minutes
-------------+-----------------------+----------------+---------------+-----------------
  cust_4821  | 2026-03-26 00:00:00   |            312 |             8 |           42.5
  cust_4821  | 2026-03-27 00:00:00   |             89 |             3 |           11.2
  cust_4821  | 2026-03-28 00:00:00   |            540 |            11 |           74.8
  cust_7703  | 2026-03-28 00:00:00   |            201 |             5 |           28.3

Tier 2: Rolling windows from daily tiles

The second materialized view reads from customer_usage_1d and computes rolling sums using conditional aggregation. The pattern is a SUM(CASE WHEN usage_date >= ... THEN value ELSE 0 END) for each window length.

CREATE MATERIALIZED VIEW customer_engagement_rolling AS
SELECT
  customer_id,
  -- 7-day window
  SUM(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '7 days' THEN api_call_count ELSE 0 END) AS api_calls_7d,
  COUNT(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '7 days' AND api_call_count > 0 THEN 1 END) AS active_days_7d,
  -- 28-day window
  SUM(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '28 days' THEN api_call_count ELSE 0 END) AS api_calls_28d,
  COUNT(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '28 days' AND api_call_count > 0 THEN 1 END) AS active_days_28d,
  -- 90-day window
  SUM(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '90 days' THEN api_call_count ELSE 0 END) AS api_calls_90d,
  SUM(CASE WHEN usage_date >= CURRENT_DATE - INTERVAL '90 days' THEN session_minutes ELSE 0 END) AS session_minutes_90d
FROM customer_usage_1d
GROUP BY customer_id;

A sample query result:

 customer_id | api_calls_7d | active_days_7d | api_calls_28d | active_days_28d | api_calls_90d | session_minutes_90d
-------------+--------------+----------------+---------------+-----------------+---------------+--------------------
  cust_4821  |         2841 |              7 |         9203  |              26 |        31450  |            4820.3
  cust_7703  |          601 |              4 |         2100  |              14 |         7340  |             980.1

Your application reads from customer_engagement_rolling directly using any PostgreSQL-compatible client. RisingWave keeps both materialized views fresh as new API events stream in. The churn model or health score endpoint gets consistently fresh data without polling batch jobs or managing refresh schedules.

Querying results

For a single customer lookup, the query is a point read against the materialized view:

SELECT
  customer_id,
  api_calls_7d,
  active_days_7d,
  api_calls_28d,
  active_days_28d,
  api_calls_90d,
  session_minutes_90d
FROM customer_engagement_rolling
WHERE customer_id = 'cust_4821';

Because RisingWave serves materialized views as pre-computed tables, this query returns in milliseconds regardless of how many events have been processed upstream.

State Size Comparison

Approach	State size	Update cost per event
Direct 90d window on raw API events	O(customers x API calls in 90d)	Event appended; stale events evicted; full window recomputed per customer
Two-tier: daily tiles + rolling sum	O(customers x 90 daily rows)	Event updates one daily bucket; rolling MV re-sums up to 90 rows
Separate MVs per window (90d, 28d, 7d)	O(customers x raw events x 3)	Three independent full-window recomputations per event

The two-tier approach keeps state bounded at the daily grain. The update cost per incoming API event is minimal: the event touches one daily tile row, and the rolling view re-sums a small, fixed number of rows per customer.

Why Daily Tiles Are the Right Primitive

Bounded grain. A daily bucket is complete once the calendar day closes. That tile is stable after that point, which makes it a reliable foundation for downstream computation. Raw events never have this property.

Cheap reuse. The same customer_usage_1d view feeds your 7d, 28d, and 90d rolling windows. You create one daily aggregation job, not three separate jobs each scanning raw events. Adding a 14-day window for a new cohort analysis means adding two CASE WHEN clauses to the rolling view, not a new stateful job.

Alignment with batch data. A daily row in your streaming materialized view maps exactly to a daily row in your data warehouse. When you compare streaming results against batch output for data quality checks, the grain aligns. This is much harder when comparing raw-event streaming windows against daily batch aggregates.

Simpler backfill. If your API event schema changes (a new field, a corrected session duration calculation), you recompute only the affected daily tiles and the rolling view re-aggregates automatically. There is no need to replay 90 days of raw events through a modified window function.

Why Not Create Separate Views for Each Window?

A common alternative is three independent materialized views, each scanning raw events over its own window:

-- Avoid this pattern
CREATE MATERIALIZED VIEW engagement_7d  AS ... WHERE event_ts >= NOW() - INTERVAL '7 days';
CREATE MATERIALIZED VIEW engagement_28d AS ... WHERE event_ts >= NOW() - INTERVAL '28 days';
CREATE MATERIALIZED VIEW engagement_90d AS ... WHERE event_ts >= NOW() - INTERVAL '90 days';

Each view is its own maintenance burden. State is duplicated: an API call from three days ago lives in all three views simultaneously. Storage is proportional to the number of windows multiplied by raw event volume. Update cost multiplies accordingly. Modifying business logic (changing which events count as "active") means updating three separate view definitions instead of one.

Daily tiles are the more general primitive. You pay the aggregation cost once per day per customer, and every window length draws from the same compact output.

FAQ

Does this pattern handle late-arriving API events correctly?

Yes. Because Tier 1 uses DATE_TRUNC('day', event_ts) as the grouping key rather than a hard watermark cutoff, a late-arriving event updates the correct daily tile when it lands. The tile for that customer and day is revised, and the rolling view picks up the change on its next maintenance cycle. If you need strict control over late data, RisingWave supports watermarks on source definitions that you can tune to match your lateness tolerance.

What if I need sub-day windows, like hourly rolling metrics?

The same pattern applies at a finer grain. Create an hourly tile (Tier 1 uses DATE_TRUNC('hour', ...)) and build your 6-hour, 12-hour, and 24-hour windows from it. State grows proportionally with granularity, but the trade-off still beats maintaining raw event state across multi-day windows.

How does RisingWave handle the CURRENT_DATE reference in the rolling view?

CURRENT_DATE in a streaming materialized view refers to the wall-clock date at query time, not at event ingest time. The rolling windows automatically advance as the calendar date changes without any manual intervention or job restart. The daily boundary shift is handled by RisingWave's incremental maintenance engine.

Can I push these rolling metrics to an external system?

Yes. Use a sink connector to push results from customer_engagement_rolling downstream. For example, to keep a PostgreSQL serving database synchronized:

CREATE SINK engagement_metrics_to_pg AS
SELECT * FROM customer_engagement_rolling
WITH (
  connector = 'jdbc',
  jdbc.url = 'jdbc:postgresql://serving-db:5432/metrics',
  table.name = 'customer_engagement_rolling',
  type = 'upsert',
  primary_key = 'customer_id'
);

RisingWave pushes incremental changes to the sink as the materialized view updates, so your serving database stays current without polling.

How does this compare to scheduling dbt jobs for the same metrics?

dbt runs batch transformations on a schedule, so rolling metrics are only as fresh as your last job run -- typically hourly or daily. The two-tier pattern in RisingWave maintains both tiers continuously, meaning api_calls_7d reflects the last completed event within seconds. For churn models and health scores that inform real-time product decisions (in-app warnings, CSM alerts), that latency gap matters.

Conclusion

The two-tier aggregation pattern solves a real cost problem in streaming analytics. Instead of holding 90 days of raw API events in state per customer, you hold 90 compact daily rows. The rolling windows your application queries are always fresh because RisingWave maintains both tiers incrementally as new events arrive.

Key points:

Daily tiles cap state at the day grain. Once a day closes, the tile is stable and cheap to maintain.
A single daily aggregation materialized view feeds every rolling window without duplicated state.
The update path per incoming API event is narrow: one daily tile row changes, and the rolling view re-sums a bounded number of rows.
Daily tiles align with batch data warehouse grains, making reconciliation and backfill straightforward.
Adding new window lengths is a SQL edit in one view, not a new stateful streaming job.

For teams building SaaS engagement metrics, customer health scores, or any streaming aggregation with multiple lookback windows, this pattern is the right default starting point.

Try it yourself. RisingWave Cloud has a free tier with no credit card required. Sign up here and run these queries in minutes.

Join our Slack community to ask questions and share how you are using rolling window patterns in your own pipelines.