RisingWave enables real-time A/B test analysis for games by continuously aggregating player behavior metrics—retention, session length, spend—per experiment variant as events stream in from Kafka. Instead of waiting hours for batch results, game teams see live statistical signals within minutes of launching a test.
The Cost of Slow A/B Test Results
A/B testing is table stakes for game studios optimizing monetization, onboarding, and feature rollouts. But traditional experimentation platforms rely on batch ETL pipelines: events land in S3, a Spark job runs every few hours, and analysts check dashboards the next day.
For live-service games, that lag is expensive. A pricing experiment on a battle pass bundle might run for 48 hours before you see results—even if the control variant is clearly underperforming. A bad onboarding A/B test costs you real players every hour it stays live. Real-time experiment infrastructure means you can call winning variants faster and kill losers before they drain LTV.
Architecture Overview
The streaming A/B testing pipeline has three layers:
- Event ingestion: Player actions (purchases, level completions, session starts) flow into Kafka.
- Experiment assignment table: A reference table maps each player to their current experiment variant.
- Streaming aggregation: RisingWave materialized views join events with assignments and compute per-variant metrics continuously.
Setting Up the Experiment Assignment Reference Table
Experiment assignments are typically managed by your feature flag system and ingested via CDC or a direct insert:
CREATE TABLE experiment_assignments (
player_id BIGINT,
experiment_id VARCHAR,
variant VARCHAR, -- 'control', 'treatment_a', 'treatment_b'
assigned_at TIMESTAMPTZ,
PRIMARY KEY (player_id, experiment_id)
);
This table is populated by your assignment service and stays in RisingWave as a queryable reference for joins.
Ingesting Player Action Events
CREATE SOURCE player_actions (
player_id BIGINT,
game_id VARCHAR,
action_type VARCHAR, -- 'purchase', 'session_start', 'level_complete', 'churn'
action_value DOUBLE PRECISION,
action_time TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'player-action-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
Real-Time Per-Variant Metrics Materialized View
This materialized view joins streaming events with experiment assignments and computes the core experiment metrics per variant in real time:
CREATE MATERIALIZED VIEW ab_test_results AS
SELECT
ea.experiment_id,
ea.variant,
window_start,
window_end,
COUNT(DISTINCT pa.player_id) AS active_players,
COUNT(*) FILTER (WHERE pa.action_type = 'purchase') AS purchase_count,
SUM(pa.action_value) FILTER (WHERE pa.action_type = 'purchase') AS total_revenue,
AVG(pa.action_value) FILTER (WHERE pa.action_type = 'purchase') AS avg_order_value,
COUNT(DISTINCT pa.player_id) FILTER (WHERE pa.action_type = 'purchase')
AS converting_players,
ROUND(
COUNT(DISTINCT pa.player_id) FILTER (WHERE pa.action_type = 'purchase')::NUMERIC
/ NULLIF(COUNT(DISTINCT pa.player_id), 0) * 100, 2
) AS conversion_rate_pct
FROM TUMBLE(player_actions, action_time, INTERVAL '1 hour') pa
JOIN experiment_assignments ea
ON pa.player_id = ea.player_id
GROUP BY ea.experiment_id, ea.variant, window_start, window_end;
Every hour, you see a fresh row per (experiment_id, variant) with live conversion rates and revenue. No batch job required.
Session-Level Retention Tracking
For longer-horizon metrics like D1 and D7 retention, use a session-based HOP window that slides daily over a 7-day lookback:
CREATE MATERIALIZED VIEW retention_by_variant AS
SELECT
ea.experiment_id,
ea.variant,
window_end::DATE AS cohort_snapshot,
COUNT(DISTINCT pa.player_id) AS retained_players
FROM HOP(player_actions, action_time, INTERVAL '1 day', INTERVAL '7 days') pa
JOIN experiment_assignments ea
ON pa.player_id = ea.player_id
WHERE pa.action_type = 'session_start'
GROUP BY ea.experiment_id, ea.variant, window_end;
Query this view alongside the assignment date to compute D1/D7 rates per variant in real time.
Comparison: Batch vs. Streaming A/B Testing
| Dimension | Batch ETL Approach | Streaming with RisingWave |
| Result latency | 4–24 hours | Under 1 minute |
| Experiment iteration speed | 1–3 days per cycle | Same day |
| Infrastructure complexity | S3 + Spark + Airflow + BI | Single streaming database |
| Statistical noise window | Long tail of late events | Continuous window updates |
| Cost to kill a bad variant | High (lost LTV) | Minimal (fast detection) |
| Developer interface | HiveQL / SparkSQL / custom | Standard PostgreSQL |
Sinking Live Results to Your Dashboard
Push experiment results to a downstream OLAP store or Kafka topic for your experimentation platform:
CREATE SINK ab_results_to_kafka
FROM ab_test_results
WITH (
connector = 'kafka',
topic = 'ab-test-live-results',
properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;
Your experimentation dashboard consumes this topic and updates significance calculations in near real time.
Guardrails and Best Practices
Novelty effect handling: Use a assigned_at + INTERVAL '24 hours' filter to exclude actions within the first 24 hours of assignment if novelty effect is a concern for your game's genre.
Multiple testing correction: RisingWave handles the data aggregation layer. Apply Bonferroni or Benjamini-Hochberg corrections in your statistics layer downstream, feeding from the live aggregations.
Holdout groups: Add a 'holdout' variant to the assignment table. Players in holdout never see any experiment—crucial for measuring long-term game health independently of ongoing tests.
FAQ
Q: How do I handle experiment assignment changes mid-experiment?
A: Store the assigned_at timestamp in experiment_assignments and filter player actions to only those after assignment. If a player is re-assigned, insert a new row—or update the existing row and use assigned_at as the action filter boundary.
Q: Can RisingWave run statistical significance tests natively? A: RisingWave handles aggregation (counts, sums, averages, conversion rates). For p-values and confidence intervals, connect a downstream Python service or your stats platform to the materialized view via the PostgreSQL wire protocol and compute significance there.
Q: What if two experiments affect the same player simultaneously?
A: Each (player_id, experiment_id) pair is a separate row in experiment_assignments. The materialized view groups by experiment_id, so the same player action can appear in multiple experiment result rows without contamination—as long as you're careful about correlated experiments.
Q: How do we ensure low-latency results without overloading the database? A: RisingWave's incremental computation only processes the delta of new events per window interval. Resource usage scales with event rate, not total data size. For most game studios, even peak events (new season launches) are handled without manual scaling.
Q: Can we run multivariate (A/B/C/D) tests?
A: Yes. Add more variant values to the variant column in experiment_assignments. The materialized view groups by variant, so all four variants appear as separate rows automatically—no schema changes needed.
Get Started
Ship faster, smarter experiments with real-time streaming SQL:
- Documentation: docs.risingwave.com/get-started
- Community: Join the conversation on RisingWave Slack

