Real-Time GMV Tracking for E-Commerce with Streaming SQL

Real-Time GMV Tracking for E-Commerce with Streaming SQL

Gross merchandise value is the single most-watched metric in e-commerce operations — and it is also the one most likely to be stale. With RisingWave, a PostgreSQL-compatible streaming database, you can build GMV dashboards that update in real time as orders flow through Kafka, without recomputing aggregations on every dashboard refresh.

The Problem: Batch GMV Is Always Wrong

GMV measures the total value of merchandise sold through a platform over a period of time. For a merchant running a flash sale, knowing the current GMV at minute-level granularity is operationally critical: it determines when to stop discounts, when to trigger inventory reservations, and whether the campaign is tracking toward its revenue target.

Batch pipelines compute GMV on a schedule — every 15 minutes, every hour, or once a day. Between runs, the number on the dashboard is wrong. Not wrong by a rounding error, but wrong by all the transactions that have occurred since the last batch completed.

The cost of stale GMV is real:

  • Merchant trust: Merchants see different numbers in the dashboard versus their own records.
  • Operations: Campaign managers make budget and inventory decisions on outdated figures.
  • Finance: Reconciliation requires manual adjustments when batch numbers don't match real-time event counts.

SHOPLINE, Asia's leading commerce enablement platform, solved this by adopting RisingWave and achieving a 76.7% reduction in API response times — the kind of result that comes from pre-computing GMV continuously rather than at query time.

How Streaming SQL Solves This

RisingWave maintains materialized views incrementally. When a new order event arrives in Kafka, RisingWave computes the minimal delta to update every dependent materialized view — hourly windows, daily totals, merchant breakdowns, channel splits — in a single pass. The results are always current, and dashboard queries read pre-computed values rather than scanning raw event logs.

This is fundamentally different from a streaming aggregation tool like Flink, which computes results and writes them to an external store. With RisingWave, the serving layer is built in: you query materialized views using standard PostgreSQL SQL.

Building Real-Time GMV Tracking

Step 1: Data Source Setup

Connect RisingWave to your order event stream. A single Kafka topic carries all order lifecycle events across channels:

CREATE SOURCE orders (
    order_id        VARCHAR,
    merchant_id     VARCHAR,
    customer_id     VARCHAR,
    channel         VARCHAR,        -- 'web', 'mobile', 'pos', 'marketplace'
    payment_method  VARCHAR,        -- 'card', 'alipay', 'paypal', 'cod'
    subtotal        NUMERIC,
    discount        NUMERIC,
    total           NUMERIC,
    status          VARCHAR,
    created_at      TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'ecommerce.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

For platforms with multiple Kafka clusters per region (e.g., separate topics for APAC and EU storefronts), you can create multiple sources and UNION them in downstream views.

Step 2: Core Materialized View

The primary GMV view uses tumbling windows to produce non-overlapping hourly snapshots. Each row represents one hour of activity for one merchant on one channel:

CREATE MATERIALIZED VIEW mv_gmv_hourly AS
SELECT
    merchant_id,
    channel,
    WINDOW_START                                AS hour_start,
    WINDOW_END                                  AS hour_end,
    COUNT(*)                                    AS order_count,
    SUM(total)                                  AS gmv,
    SUM(subtotal)                               AS gross_subtotal,
    SUM(discount)                               AS total_discount,
    ROUND(AVG(total), 2)                        AS aov,
    COUNT(DISTINCT customer_id)                 AS unique_buyers,
    COUNT(CASE WHEN status = 'refunded'
               THEN 1 END)                     AS refund_count,
    ROUND(
        COUNT(CASE WHEN status = 'refunded'
                   THEN 1 END)::NUMERIC /
        NULLIF(COUNT(*), 0) * 100,
        2
    )                                           AS refund_rate_pct
FROM TUMBLE(orders, created_at, INTERVAL '1 HOUR')
WHERE status NOT IN ('cancelled')
GROUP BY merchant_id, channel, WINDOW_START, WINDOW_END;

This view is always current — RisingWave updates it incrementally with every new order event.

Step 3: Aggregations — Daily Rollup and Running Totals

Build a daily summary view on top of the hourly view for efficient period-over-period comparison:

CREATE MATERIALIZED VIEW mv_gmv_daily AS
SELECT
    merchant_id,
    channel,
    DATE_TRUNC('day', hour_start)               AS day,
    SUM(order_count)                            AS daily_orders,
    SUM(gmv)                                    AS daily_gmv,
    SUM(total_discount)                         AS daily_discount,
    ROUND(AVG(aov), 2)                          AS avg_aov,
    SUM(unique_buyers)                          AS daily_buyers,
    ROUND(
        SUM(refund_count)::NUMERIC /
        NULLIF(SUM(order_count), 0) * 100,
        2
    )                                           AS daily_refund_rate_pct
FROM mv_gmv_hourly
GROUP BY merchant_id, channel, DATE_TRUNC('day', hour_start);

For campaign monitoring, a 5-minute tumbling window lets operations teams watch GMV velocity during flash sales:

CREATE MATERIALIZED VIEW mv_gmv_5min AS
SELECT
    merchant_id,
    channel,
    WINDOW_START                                AS window_start,
    COUNT(*)                                    AS orders_in_window,
    SUM(total)                                  AS gmv_in_window,
    ROUND(AVG(total), 2)                        AS aov_in_window
FROM TUMBLE(orders, created_at, INTERVAL '5 MINUTES')
WHERE status NOT IN ('cancelled')
GROUP BY merchant_id, channel, WINDOW_START;

Step 4: Downstream / Serving

Push GMV metrics to the application database that dashboard APIs query:

CREATE SINK sink_gmv_hourly
FROM mv_gmv_hourly
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://app-db:5432/commerce',
    table.name = 'gmv_hourly',
    type = 'upsert',
    primary_key = 'merchant_id,channel,hour_start'
);

CREATE SINK sink_gmv_daily
FROM mv_gmv_daily
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://app-db:5432/commerce',
    table.name = 'gmv_daily',
    type = 'upsert',
    primary_key = 'merchant_id,channel,day'
);

Dashboard API endpoints now execute simple SELECT queries against pre-populated tables — no aggregations, no warehouse round trips.

Batch GMV vs. Streaming GMV

DimensionBatch PipelineStreaming SQL (RisingWave)
FreshnessStale until next batch runContinuously updated
GranularityHourly or daily onlyAny window size (minutes, hours, days)
API latencyHigh — aggregation at query timeLow — pre-computed results
ConsistencyInconsistent during batch runsAlways consistent
BackfillRe-run job from startReplay Kafka offset
Multi-channel breakdownOften skipped (too slow)Native — group by channel in SQL
Flash sale monitoringNot possible5-minute windows, continuously updated

FAQ

Q: How does RisingWave handle order status updates (e.g., a delivered order that gets refunded)? RisingWave sources can be configured to process update and delete events, not just inserts. If your Kafka topic uses an event type field to indicate upserts, you can filter and apply them to keep your GMV accurate as order statuses change. For CDC sources (PostgreSQL or MySQL), upserts are handled natively.

Q: Can I compute GMV across all merchants (platform-level) and per-merchant simultaneously? Yes — create separate materialized views with different GROUP BY clauses. RisingWave computes them independently but shares the same input processing, so there is no duplicate work reading from Kafka.

Q: What happens if Kafka has a gap or the consumer falls behind? RisingWave's internal checkpointing ensures exactly-once processing semantics. If the consumer falls behind (e.g., during a Kafka partition rebalance), it will catch up automatically from the last committed checkpoint without double-counting events.

Key Takeaways

  • Batch GMV pipelines produce stale data by design — streaming SQL keeps GMV continuously current.
  • RisingWave's TUMBLE window function produces non-overlapping time buckets at any granularity, from 5-minute flash sale windows to daily summaries.
  • Hierarchical materialized views (hourly → daily) reduce redundant computation and keep downstream queries fast.
  • Pre-computed results mean dashboard API latency is a point lookup, not an aggregation query — this is the structural source of SHOPLINE's 76.7% API latency reduction.
  • A single SQL source definition feeds hourly, daily, and 5-minute views simultaneously.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.