Real-Time A/B Testing with Streaming SQL

Real-Time A/B Testing with Streaming SQL

·

12 min read

Most A/B testing platforms show you results with a delay. You launch an experiment on Monday, and by Wednesday you are still waiting for a batch pipeline to crunch the numbers. If a variant is actively hurting conversion, that is three days of lost revenue.

The problem is not the statistics. It is the data pipeline. Traditional A/B testing stacks rely on batch ETL jobs that aggregate events once per hour or once per day. By the time you see results, the damage (or the opportunity) has already passed.

Streaming SQL changes this equation. Instead of scheduling batch aggregations, you define materialized views that continuously compute conversion rates, funnel metrics, and revenue per variant as each event arrives. In this guide, you will build a complete real-time A/B testing pipeline using RisingWave, a streaming database that keeps your experiment metrics fresh with sub-second latency.

Why Do Batch Pipelines Slow Down A/B Testing?

A traditional A/B testing data pipeline looks something like this:

  1. Your application assigns users to variants and logs events to a message queue or event store.
  2. A scheduled batch job (Airflow, dbt, Spark) runs every hour or every day to aggregate metrics.
  3. Results land in a data warehouse where a dashboard queries them.

This architecture introduces two kinds of delay. First, the batch interval itself: if your job runs hourly, you are always at least 30 minutes behind on average. Second, query latency: your dashboard has to scan potentially millions of rows each time it refreshes.

For growth teams running experiments on checkout flows, pricing pages, or onboarding funnels, these delays create real risk. A broken variant can run for hours before anyone notices. A winning variant sits idle while you wait for statistical confidence to build in yesterday's data.

Netflix recognized this problem and developed sequential testing methods that allow continuous monitoring of experiment metrics. Their approach works because the underlying data pipeline delivers results in real time. You need the same kind of infrastructure.

How Does a Streaming A/B Testing Pipeline Work?

A streaming approach replaces the batch aggregation step with continuously maintained materialized views. Here is the architecture:

  1. Your application publishes experiment assignment events and user interaction events to Kafka (or another message broker).
  2. RisingWave ingests these streams and joins them in real time.
  3. Materialized views continuously compute conversion rates, funnel breakdowns, and revenue metrics per variant.
  4. Your dashboard or alerting system queries RisingWave directly, getting fresh results on every request.

The key difference: there is no batch job to schedule, no waiting for the next ETL run. When a user converts, the conversion rate updates within seconds.

In production, you would create Kafka sources to ingest events. The DDL looks like this:

-- Source for experiment assignments (from Kafka in production)
CREATE SOURCE experiment_assignments_source (
    user_id VARCHAR,
    experiment_id VARCHAR,
    variant VARCHAR,
    assigned_at TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'experiment.assignments',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

-- Source for page/interaction events
CREATE SOURCE page_events_source (
    event_id VARCHAR,
    user_id VARCHAR,
    experiment_id VARCHAR,
    event_type VARCHAR,
    page_url VARCHAR,
    revenue DOUBLE PRECISION,
    event_time TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'page.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

For this tutorial, we will use regular tables so you can follow along on any local RisingWave instance without needing Kafka. The SQL patterns are identical either way.

How to Build Experiment Tracking with Streaming SQL

Let's build the pipeline step by step. All SQL below has been verified against RisingWave 2.8.0.

Step 1: Create the base tables

You need two tables: one for tracking which users are assigned to which experiment variants, and one for capturing user interactions.

CREATE TABLE experiment_assignments (
    user_id VARCHAR,
    experiment_id VARCHAR,
    variant VARCHAR,
    assigned_at TIMESTAMPTZ
);

CREATE TABLE page_events (
    event_id VARCHAR,
    user_id VARCHAR,
    experiment_id VARCHAR,
    event_type VARCHAR,
    page_url VARCHAR,
    revenue DOUBLE PRECISION,
    event_time TIMESTAMPTZ
);

Step 2: Insert sample data

Let's simulate a checkout redesign experiment. We have 10 users split evenly between a control group and variant A. The variant group has a higher conversion rate.

-- Assign users to experiment variants
INSERT INTO experiment_assignments VALUES
('u001', 'checkout_redesign', 'control', '2026-03-29 10:00:00+00'),
('u002', 'checkout_redesign', 'variant_a', '2026-03-29 10:00:05+00'),
('u003', 'checkout_redesign', 'control', '2026-03-29 10:00:10+00'),
('u004', 'checkout_redesign', 'variant_a', '2026-03-29 10:00:15+00'),
('u005', 'checkout_redesign', 'control', '2026-03-29 10:00:20+00'),
('u006', 'checkout_redesign', 'variant_a', '2026-03-29 10:00:25+00'),
('u007', 'checkout_redesign', 'control', '2026-03-29 10:00:30+00'),
('u008', 'checkout_redesign', 'variant_a', '2026-03-29 10:00:35+00'),
('u009', 'checkout_redesign', 'control', '2026-03-29 10:00:40+00'),
('u010', 'checkout_redesign', 'variant_a', '2026-03-29 10:00:45+00');

-- Log user interaction events
INSERT INTO page_events VALUES
-- Control group: 5 users, 2 purchase
('e001', 'u001', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:00+00'),
('e002', 'u001', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:00+00'),
('e003', 'u001', 'checkout_redesign', 'purchase', '/checkout/confirm', 49.99, '2026-03-29 10:03:00+00'),
('e004', 'u003', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:30+00'),
('e005', 'u003', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:30+00'),
('e006', 'u005', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:45+00'),
('e007', 'u007', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:02:00+00'),
('e008', 'u007', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:03:00+00'),
('e009', 'u007', 'checkout_redesign', 'purchase', '/checkout/confirm', 35.00, '2026-03-29 10:04:00+00'),
('e010', 'u009', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:02:15+00'),
-- Variant A group: 5 users, 4 purchase
('e011', 'u002', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:00+00'),
('e012', 'u002', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:01:30+00'),
('e013', 'u002', 'checkout_redesign', 'purchase', '/checkout/confirm', 59.99, '2026-03-29 10:02:00+00'),
('e014', 'u004', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:15+00'),
('e015', 'u004', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:00+00'),
('e016', 'u004', 'checkout_redesign', 'purchase', '/checkout/confirm', 72.50, '2026-03-29 10:02:30+00'),
('e017', 'u006', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:30+00'),
('e018', 'u006', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:15+00'),
('e019', 'u006', 'checkout_redesign', 'purchase', '/checkout/confirm', 45.00, '2026-03-29 10:03:00+00'),
('e020', 'u008', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:01:45+00'),
('e021', 'u008', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:30+00'),
('e022', 'u010', 'checkout_redesign', 'page_view', '/checkout', NULL, '2026-03-29 10:02:00+00'),
('e023', 'u010', 'checkout_redesign', 'click', '/checkout/submit', NULL, '2026-03-29 10:02:45+00'),
('e024', 'u010', 'checkout_redesign', 'purchase', '/checkout/confirm', 89.99, '2026-03-29 10:03:30+00');

Step 3: Create the experiment summary view

This materialized view continuously computes the key metrics for each variant: total users, conversions, conversion rate, total revenue, and revenue per user.

CREATE MATERIALIZED VIEW experiment_summary AS
SELECT
    a.experiment_id,
    a.variant,
    COUNT(DISTINCT a.user_id) AS total_users,
    COUNT(DISTINCT CASE WHEN e.event_type = 'purchase' THEN e.user_id END) AS conversions,
    ROUND(
        COUNT(DISTINCT CASE WHEN e.event_type = 'purchase' THEN e.user_id END)::NUMERIC
        / COUNT(DISTINCT a.user_id)::NUMERIC * 100, 2
    ) AS conversion_rate,
    ROUND(COALESCE(SUM(CASE WHEN e.event_type = 'purchase' THEN e.revenue END), 0)::NUMERIC, 2) AS total_revenue,
    ROUND(
        COALESCE(SUM(CASE WHEN e.event_type = 'purchase' THEN e.revenue END), 0)::NUMERIC
        / COUNT(DISTINCT a.user_id)::NUMERIC, 2
    ) AS revenue_per_user
FROM experiment_assignments a
LEFT JOIN page_events e
    ON a.user_id = e.user_id AND a.experiment_id = e.experiment_id
GROUP BY a.experiment_id, a.variant;

Query the view:

SELECT * FROM experiment_summary ORDER BY variant;
   experiment_id   |  variant  | total_users | conversions | conversion_rate | total_revenue | revenue_per_user
-------------------+-----------+-------------+-------------+-----------------+---------------+------------------
 checkout_redesign | control   |           5 |           2 |           40.00 |         84.99 |            17.00
 checkout_redesign | variant_a |           5 |           4 |           80.00 |        267.48 |            53.50

The variant has double the conversion rate and more than triple the revenue per user. In a traditional batch pipeline, you would need to wait for the next scheduled job to see these numbers. Here, they update the moment a new event arrives.

How to Track Funnel Drop-Off Per Variant

Conversion rate alone does not tell the full story. You want to know where users drop off in each variant. A funnel materialized view breaks the user journey into stages.

CREATE MATERIALIZED VIEW conversion_funnel AS
SELECT
    a.experiment_id,
    a.variant,
    COUNT(DISTINCT a.user_id) AS assigned_users,
    COUNT(DISTINCT CASE WHEN e.event_type = 'page_view' THEN e.user_id END) AS viewed_page,
    COUNT(DISTINCT CASE WHEN e.event_type = 'click' THEN e.user_id END) AS clicked,
    COUNT(DISTINCT CASE WHEN e.event_type = 'purchase' THEN e.user_id END) AS purchased,
    ROUND(
        COUNT(DISTINCT CASE WHEN e.event_type = 'click' THEN e.user_id END)::NUMERIC
        / NULLIF(COUNT(DISTINCT CASE WHEN e.event_type = 'page_view' THEN e.user_id END), 0)::NUMERIC * 100, 1
    ) AS view_to_click_pct,
    ROUND(
        COUNT(DISTINCT CASE WHEN e.event_type = 'purchase' THEN e.user_id END)::NUMERIC
        / NULLIF(COUNT(DISTINCT CASE WHEN e.event_type = 'click' THEN e.user_id END), 0)::NUMERIC * 100, 1
    ) AS click_to_purchase_pct
FROM experiment_assignments a
LEFT JOIN page_events e
    ON a.user_id = e.user_id AND a.experiment_id = e.experiment_id
GROUP BY a.experiment_id, a.variant;
SELECT * FROM conversion_funnel ORDER BY variant;
   experiment_id   |  variant  | assigned_users | viewed_page | clicked | purchased | view_to_click_pct | click_to_purchase_pct
-------------------+-----------+----------------+-------------+---------+-----------+-------------------+-----------------------
 checkout_redesign | control   |              5 |           5 |       3 |         2 |              60.0 |                  66.7
 checkout_redesign | variant_a |              5 |           5 |       5 |         4 |             100.0 |                  80.0

This tells a clear story. In the control group, only 60% of users who viewed the checkout page clicked the submit button, and 66.7% of those who clicked actually completed the purchase. In variant A, every user who saw the page clicked submit (100%), and 80% completed the purchase. The redesigned checkout flow reduces friction at every step.

Because this is a materialized view, these funnel metrics stay current as new events stream in. No need to re-run a query against the raw events table.

How to Detect the Winning Variant Automatically

Manual monitoring is fine for a handful of experiments. But if your team runs dozens of experiments simultaneously, you want automated winner detection. This materialized view compares each variant against the control and computes the lift.

CREATE MATERIALIZED VIEW experiment_winner AS
WITH stats AS (
    SELECT
        experiment_id,
        variant,
        total_users,
        conversions,
        conversion_rate,
        revenue_per_user
    FROM experiment_summary
),
control AS (
    SELECT * FROM stats WHERE variant = 'control'
),
variants AS (
    SELECT * FROM stats WHERE variant != 'control'
)
SELECT
    v.experiment_id,
    v.variant,
    v.conversion_rate AS variant_conversion_rate,
    c.conversion_rate AS control_conversion_rate,
    ROUND(v.conversion_rate - c.conversion_rate, 2) AS conversion_lift,
    v.revenue_per_user AS variant_rev_per_user,
    c.revenue_per_user AS control_rev_per_user,
    ROUND(
        (v.revenue_per_user - c.revenue_per_user)
        / NULLIF(c.revenue_per_user, 0) * 100, 1
    ) AS revenue_lift_pct,
    CASE
        WHEN v.total_users >= 5 AND c.total_users >= 5
             AND v.conversion_rate > c.conversion_rate
        THEN 'variant_leading'
        WHEN v.total_users < 5 OR c.total_users < 5
        THEN 'insufficient_data'
        ELSE 'control_leading'
    END AS status
FROM variants v
JOIN control c ON v.experiment_id = c.experiment_id;
SELECT * FROM experiment_winner;
   experiment_id   |  variant  | variant_conversion_rate | control_conversion_rate | conversion_lift | variant_rev_per_user | control_rev_per_user | revenue_lift_pct |     status
-------------------+-----------+------------------------+------------------------+-----------------+----------------------+----------------------+------------------+-----------------
 checkout_redesign | variant_a |                  80.00 |                   40.00 |           40.00 |                53.50 |                17.00 |            214.7 | variant_leading

This is a materialized view built on top of another materialized view, a pattern RisingWave calls MV-on-MV. When a new event flows into page_events, it updates experiment_summary, which in turn updates experiment_winner. The entire chain recomputes incrementally, not from scratch.

You can connect this view to an alerting system. For example, a simple polling query from your application can check for experiments where status = 'variant_leading' and conversion_lift > 10 to trigger a Slack notification or auto-promote the variant.

Adding time-windowed trend analysis

For experiments that run over days or weeks, you want to see how metrics trend over time. A tumbling window groups events into fixed intervals:

CREATE MATERIALIZED VIEW hourly_experiment_metrics AS
SELECT
    a.experiment_id,
    a.variant,
    window_start,
    COUNT(DISTINCT a.user_id) AS users,
    COUNT(DISTINCT CASE WHEN e.event_type = 'purchase' THEN e.user_id END) AS conversions,
    ROUND(COALESCE(SUM(CASE WHEN e.event_type = 'purchase' THEN e.revenue END), 0)::NUMERIC, 2) AS revenue
FROM experiment_assignments a
LEFT JOIN page_events e
    ON a.user_id = e.user_id AND a.experiment_id = e.experiment_id
JOIN TUMBLE(page_events, event_time, INTERVAL '1 HOUR') t
    ON e.event_id = t.event_id
GROUP BY a.experiment_id, a.variant, window_start;
SELECT * FROM hourly_experiment_metrics ORDER BY variant, window_start;
   experiment_id   |  variant  |       window_start        | users | conversions | revenue
-------------------+-----------+---------------------------+-------+-------------+---------
 checkout_redesign | control   | 2026-03-29 10:00:00+00:00 |     5 |           2 |   84.99
 checkout_redesign | variant_a | 2026-03-29 10:00:00+00:00 |     5 |           4 |  267.48

As more events arrive across different hours, this view accumulates per-hour snapshots, giving you a time series to plot in Grafana or any dashboard tool.

How Does This Compare to Batch A/B Testing?

AspectBatch pipelineStreaming SQL (RisingWave)
Metric freshnessMinutes to hours (depends on schedule)Sub-second (continuous)
InfrastructureScheduler + ETL + warehouse + dashboardStreaming database + dashboard
Funnel analysisRe-scan raw events on each queryPre-computed, incrementally maintained
Winner detectionManual or scheduled scriptContinuous, queryable at any time
Scaling experimentsEach experiment adds ETL complexityEach experiment is just another materialized view
Rollback speedWait for next batch cycle to detect issuesDetect regressions within seconds

The streaming approach is not about replacing your statistical methodology. You still need proper sample sizes and significance testing. The difference is in how fast you can see the data that feeds those tests. Netflix's engineering team has shown that sequential testing on streaming data can detect large regressions within 60 seconds, compared to hours with batch approaches.

What About Statistical Significance?

A fair question: can you really make decisions on streaming data without introducing peeking bias?

Yes, but you need the right statistical framework. Traditional frequentist A/B tests assume a fixed sample size. If you check results repeatedly as data streams in, you inflate the false positive rate. This is called the peeking problem.

The solution is sequential testing (also called anytime-valid inference). Instead of computing a single p-value at the end, sequential methods produce confidence sequences that remain valid regardless of when you check. Netflix, Spotify, and other companies that run experiments on streaming data use these methods.

In a RisingWave-based pipeline, you separate the concerns:

  • RisingWave handles the data layer: continuously computing per-variant aggregates (conversion counts, revenue sums, sample sizes).
  • Your application handles the statistics: reading the aggregates from RisingWave and applying sequential testing logic (e.g., mixture sequential probability ratio tests or always-valid confidence intervals).

This separation is clean and practical. RisingWave gives you the fresh data. Your statistics library (Python's statsmodels, R, or a custom service) applies the appropriate test. You do not need to embed complex statistical functions in SQL.

FAQ

What is real-time A/B testing?

Real-time A/B testing is the practice of monitoring experiment metrics (conversion rates, revenue, engagement) continuously as user events arrive, rather than waiting for batch aggregation jobs. It enables faster detection of winning or losing variants and reduces the risk of running a harmful experiment for too long.

Can I use RisingWave for A/B testing without Kafka?

Yes. RisingWave supports multiple data ingestion methods. You can use tables with direct inserts for prototyping, connect to Kafka or Redpanda for production event streams, or use the PostgreSQL CDC connector to capture experiment events from your application database.

How does streaming A/B testing handle the peeking problem?

Streaming data pipelines solve the data freshness problem, not the statistical problem. To avoid inflated false positive rates from continuous monitoring, use sequential testing methods such as always-valid confidence intervals or sequential probability ratio tests. RisingWave computes the aggregates in real time, and your application applies the appropriate statistical test on those aggregates.

How many concurrent experiments can RisingWave handle?

Each experiment is a set of materialized views. RisingWave is designed to maintain thousands of materialized views concurrently, so running dozens or even hundreds of experiments simultaneously is well within its capacity. Resource usage scales with the volume of events, not the number of experiments.

Conclusion

Here is what you have built in this guide:

  • Experiment summary view that tracks conversion rate and revenue per variant in real time.
  • Funnel analysis view that shows drop-off at each stage (view, click, purchase) per variant.
  • Winner detection view that compares variants against the control and flags leaders automatically.
  • Time-windowed metrics for tracking experiment trends over hours and days.

All of these are materialized views that update incrementally as new events arrive. No batch jobs, no scheduling, no stale dashboards.

The pattern works for any kind of experiment: checkout flows, pricing pages, onboarding sequences, recommendation algorithms, or email subject lines. Define your events, assign your variants, and let the materialized views do the rest.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.