A/B Testing Infrastructure for Games with Streaming SQL

A/B Testing Infrastructure for Games with Streaming SQL

RisingWave enables real-time A/B test analysis for games by continuously aggregating player behavior metrics—retention, session length, spend—per experiment variant as events stream in from Kafka. Instead of waiting hours for batch results, game teams see live statistical signals within minutes of launching a test.

The Cost of Slow A/B Test Results

A/B testing is table stakes for game studios optimizing monetization, onboarding, and feature rollouts. But traditional experimentation platforms rely on batch ETL pipelines: events land in S3, a Spark job runs every few hours, and analysts check dashboards the next day.

For live-service games, that lag is expensive. A pricing experiment on a battle pass bundle might run for 48 hours before you see results—even if the control variant is clearly underperforming. A bad onboarding A/B test costs you real players every hour it stays live. Real-time experiment infrastructure means you can call winning variants faster and kill losers before they drain LTV.

Architecture Overview

The streaming A/B testing pipeline has three layers:

  1. Event ingestion: Player actions (purchases, level completions, session starts) flow into Kafka.
  2. Experiment assignment table: A reference table maps each player to their current experiment variant.
  3. Streaming aggregation: RisingWave materialized views join events with assignments and compute per-variant metrics continuously.

Setting Up the Experiment Assignment Reference Table

Experiment assignments are typically managed by your feature flag system and ingested via CDC or a direct insert:

CREATE TABLE experiment_assignments (
    player_id       BIGINT,
    experiment_id   VARCHAR,
    variant         VARCHAR,     -- 'control', 'treatment_a', 'treatment_b'
    assigned_at     TIMESTAMPTZ,
    PRIMARY KEY (player_id, experiment_id)
);

This table is populated by your assignment service and stays in RisingWave as a queryable reference for joins.

Ingesting Player Action Events

CREATE SOURCE player_actions (
    player_id       BIGINT,
    game_id         VARCHAR,
    action_type     VARCHAR,     -- 'purchase', 'session_start', 'level_complete', 'churn'
    action_value    DOUBLE PRECISION,
    action_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'player-action-events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Real-Time Per-Variant Metrics Materialized View

This materialized view joins streaming events with experiment assignments and computes the core experiment metrics per variant in real time:

CREATE MATERIALIZED VIEW ab_test_results AS
SELECT
    ea.experiment_id,
    ea.variant,
    window_start,
    window_end,
    COUNT(DISTINCT pa.player_id)                                        AS active_players,
    COUNT(*) FILTER (WHERE pa.action_type = 'purchase')                 AS purchase_count,
    SUM(pa.action_value) FILTER (WHERE pa.action_type = 'purchase')     AS total_revenue,
    AVG(pa.action_value) FILTER (WHERE pa.action_type = 'purchase')     AS avg_order_value,
    COUNT(DISTINCT pa.player_id) FILTER (WHERE pa.action_type = 'purchase')
        AS converting_players,
    ROUND(
        COUNT(DISTINCT pa.player_id) FILTER (WHERE pa.action_type = 'purchase')::NUMERIC
        / NULLIF(COUNT(DISTINCT pa.player_id), 0) * 100, 2
    )                                                                    AS conversion_rate_pct
FROM TUMBLE(player_actions, action_time, INTERVAL '1 hour') pa
JOIN experiment_assignments ea
  ON pa.player_id = ea.player_id
GROUP BY ea.experiment_id, ea.variant, window_start, window_end;

Every hour, you see a fresh row per (experiment_id, variant) with live conversion rates and revenue. No batch job required.

Session-Level Retention Tracking

For longer-horizon metrics like D1 and D7 retention, use a session-based HOP window that slides daily over a 7-day lookback:

CREATE MATERIALIZED VIEW retention_by_variant AS
SELECT
    ea.experiment_id,
    ea.variant,
    window_end::DATE                AS cohort_snapshot,
    COUNT(DISTINCT pa.player_id)    AS retained_players
FROM HOP(player_actions, action_time, INTERVAL '1 day', INTERVAL '7 days') pa
JOIN experiment_assignments ea
  ON pa.player_id = ea.player_id
WHERE pa.action_type = 'session_start'
GROUP BY ea.experiment_id, ea.variant, window_end;

Query this view alongside the assignment date to compute D1/D7 rates per variant in real time.

Comparison: Batch vs. Streaming A/B Testing

DimensionBatch ETL ApproachStreaming with RisingWave
Result latency4–24 hoursUnder 1 minute
Experiment iteration speed1–3 days per cycleSame day
Infrastructure complexityS3 + Spark + Airflow + BISingle streaming database
Statistical noise windowLong tail of late eventsContinuous window updates
Cost to kill a bad variantHigh (lost LTV)Minimal (fast detection)
Developer interfaceHiveQL / SparkSQL / customStandard PostgreSQL

Sinking Live Results to Your Dashboard

Push experiment results to a downstream OLAP store or Kafka topic for your experimentation platform:

CREATE SINK ab_results_to_kafka
FROM ab_test_results
WITH (
    connector = 'kafka',
    topic = 'ab-test-live-results',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Your experimentation dashboard consumes this topic and updates significance calculations in near real time.

Guardrails and Best Practices

Novelty effect handling: Use a assigned_at + INTERVAL '24 hours' filter to exclude actions within the first 24 hours of assignment if novelty effect is a concern for your game's genre.

Multiple testing correction: RisingWave handles the data aggregation layer. Apply Bonferroni or Benjamini-Hochberg corrections in your statistics layer downstream, feeding from the live aggregations.

Holdout groups: Add a 'holdout' variant to the assignment table. Players in holdout never see any experiment—crucial for measuring long-term game health independently of ongoing tests.

FAQ

Q: How do I handle experiment assignment changes mid-experiment? A: Store the assigned_at timestamp in experiment_assignments and filter player actions to only those after assignment. If a player is re-assigned, insert a new row—or update the existing row and use assigned_at as the action filter boundary.

Q: Can RisingWave run statistical significance tests natively? A: RisingWave handles aggregation (counts, sums, averages, conversion rates). For p-values and confidence intervals, connect a downstream Python service or your stats platform to the materialized view via the PostgreSQL wire protocol and compute significance there.

Q: What if two experiments affect the same player simultaneously? A: Each (player_id, experiment_id) pair is a separate row in experiment_assignments. The materialized view groups by experiment_id, so the same player action can appear in multiple experiment result rows without contamination—as long as you're careful about correlated experiments.

Q: How do we ensure low-latency results without overloading the database? A: RisingWave's incremental computation only processes the delta of new events per window interval. Resource usage scales with event rate, not total data size. For most game studios, even peak events (new season launches) are handled without manual scaling.

Q: Can we run multivariate (A/B/C/D) tests? A: Yes. Add more variant values to the variant column in experiment_assignments. The materialized view groups by variant, so all four variants appear as separate rows automatically—no schema changes needed.


Get Started

Ship faster, smarter experiments with real-time streaming SQL:

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.