Real-Time Clickstream Analytics: How E-Commerce Teams Understand Shopper Behavior Live

Real-Time Clickstream Analytics: How E-Commerce Teams Understand Shopper Behavior Live

Real-Time Clickstream Analytics: How E-Commerce Teams Understand Shopper Behavior Live

Google Analytics tells you what happened yesterday. Your merchandising team needs to know what is happening right now: which products are spiking, where users are dropping out of the funnel, and which sessions are about to abandon. Streaming SQL on raw clickstream events — captured to Kafka and processed continuously — gives you that live visibility without custom application code or expensive analytics platforms.


The Problem with Delayed Clickstream Analytics

Standard web analytics platforms aggregate data in reporting windows. By the time a dashboard reflects a broken checkout flow or a 404 on your top landing page, thousands of sessions have already been lost.

The delay is not just a UI problem. It is a data pipeline problem. Most analytics stacks send events to a cloud collector, batch-load them into a data warehouse, run transformation jobs, and serve results through a BI layer. Each step adds latency. The whole chain typically takes 30 minutes to 4 hours to produce a usable number.

For e-commerce operations running flash sales, launching new products, or running paid campaigns, 30-minute delayed feedback is not analytics — it is a post-mortem.


What Raw Clickstream Enables That Aggregated Analytics Cannot

When you control your raw clickstream — events sent directly from your frontend to Kafka — you can:

  • Reconstruct sessions in real time, calculating time-on-page and navigation sequences
  • Detect funnel drop-off as it happens, not after the session closes
  • Identify cart abandonment within seconds of the signal, enabling real-time intervention
  • Measure live conversion rates per product, per campaign, per traffic source
  • Detect anomalies like sudden traffic spikes, bot sessions, or a payment gateway going down

None of this requires a data science team. It requires a streaming database that can process the event stream with SQL.


Clickstream Event Schema

The foundation is a well-structured event schema. Every user interaction fires a JSON event to Kafka:

CREATE SOURCE clickstream_events (
    event_id       VARCHAR,
    session_id     VARCHAR,
    user_id        VARCHAR,         -- NULL for anonymous sessions
    event_type     VARCHAR,         -- 'page_view', 'product_view', 'add_to_cart',
                                    -- 'checkout_start', 'purchase', 'search'
    page_url       VARCHAR,
    product_id     VARCHAR,         -- NULL for non-product pages
    category_id    VARCHAR,
    search_query   VARCHAR,         -- populated for search events
    traffic_source VARCHAR,         -- 'organic', 'paid_search', 'email', 'direct'
    campaign_id    VARCHAR,
    device_type    VARCHAR,         -- 'mobile', 'desktop', 'tablet'
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'clickstream',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

RisingWave is a PostgreSQL-compatible streaming database, open source under Apache 2.0, built in Rust, and uses S3 for durable storage. Once this source is defined, you write SQL materialized views to derive every analytics metric you need.


Live Funnel Metrics

The most critical e-commerce metric is funnel conversion: how many sessions that start product discovery make it to purchase. Track it as a live tumbling-window aggregate:

CREATE MATERIALIZED VIEW funnel_metrics_10min AS
SELECT
    WINDOW_START,
    WINDOW_END,
    traffic_source,
    device_type,
    COUNT(DISTINCT session_id)
        FILTER (WHERE event_type = 'page_view')          AS sessions,
    COUNT(DISTINCT session_id)
        FILTER (WHERE event_type = 'product_view')       AS product_views,
    COUNT(DISTINCT session_id)
        FILTER (WHERE event_type = 'add_to_cart')        AS add_to_carts,
    COUNT(DISTINCT session_id)
        FILTER (WHERE event_type = 'checkout_start')     AS checkout_starts,
    COUNT(DISTINCT session_id)
        FILTER (WHERE event_type = 'purchase')           AS purchases,
    ROUND(
        COUNT(DISTINCT session_id) FILTER (WHERE event_type = 'purchase')::NUMERIC
        / NULLIF(COUNT(DISTINCT session_id) FILTER (WHERE event_type = 'page_view'), 0) * 100,
        2
    )                                                    AS overall_conversion_rate
FROM TUMBLE(
    clickstream_events,
    event_time,
    INTERVAL '10 MINUTES'
)
GROUP BY WINDOW_START, WINDOW_END, traffic_source, device_type;

Your live dashboard queries funnel_metrics_10min and gets a conversion rate that reflects the last 10 minutes of actual shopper behavior, broken down by traffic source and device type.


Session Reconstruction

Reconstructing sessions — understanding the navigation sequence and time-on-page within each visit — requires tracking the previous event within a session. Use a windowed aggregation over session events:

CREATE MATERIALIZED VIEW session_summary AS
SELECT
    session_id,
    user_id,
    traffic_source,
    device_type,
    MIN(event_time)                                      AS session_start,
    MAX(event_time)                                      AS last_event_time,
    EXTRACT(EPOCH FROM (MAX(event_time) - MIN(event_time))) AS session_duration_seconds,
    COUNT(*)                                             AS total_events,
    COUNT(*) FILTER (WHERE event_type = 'page_view')    AS page_views,
    COUNT(*) FILTER (WHERE event_type = 'product_view') AS product_views,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart')  AS add_to_carts,
    BOOL_OR(event_type = 'purchase')                    AS converted,
    ARRAY_AGG(product_id ORDER BY event_time)
        FILTER (WHERE product_id IS NOT NULL)           AS viewed_products
FROM clickstream_events
GROUP BY session_id, user_id, traffic_source, device_type;

This view builds up incrementally as each event arrives. By the time a session ends, session_summary contains the complete session profile without any post-processing step.


Real-Time Abandonment Detection

Cart abandonment detection is traditionally a batch job that runs hours after the session closes. With streaming SQL, you can identify sessions that added to cart but have not progressed in the last 15 minutes:

CREATE MATERIALIZED VIEW cart_abandonment_candidates AS
SELECT
    s.session_id,
    s.user_id,
    s.last_event_time,
    s.add_to_carts,
    s.converted,
    NOW() - s.last_event_time     AS idle_duration
FROM session_summary s
WHERE s.add_to_carts > 0
  AND s.converted = FALSE
  AND NOW() - s.last_event_time > INTERVAL '15 MINUTES'
  AND NOW() - s.last_event_time < INTERVAL '60 MINUTES';

Pair this with a RisingWave sink that pushes these session IDs to your marketing automation platform:

CREATE SINK cart_abandonment_alerts
FROM cart_abandonment_candidates
WITH (
    connector = 'kafka',
    topic = 'cart_abandonment_triggers',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Your email platform subscribes to cart_abandonment_triggers and fires a recovery email within minutes of the inactivity signal — not hours later when a nightly job processes abandoned carts.


Live Product Performance Dashboard

Track how each product is performing right now — views, adds-to-cart, and purchases in the last hour:

CREATE MATERIALIZED VIEW product_performance_1h AS
SELECT
    product_id,
    category_id,
    COUNT(*) FILTER (WHERE event_type = 'product_view')  AS views_1h,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart')   AS add_to_carts_1h,
    COUNT(*) FILTER (WHERE event_type = 'purchase')      AS purchases_1h,
    ROUND(
        COUNT(*) FILTER (WHERE event_type = 'purchase')::NUMERIC
        / NULLIF(COUNT(*) FILTER (WHERE event_type = 'product_view'), 0) * 100,
        2
    )                                                    AS view_to_purchase_rate,
    WINDOW_START,
    WINDOW_END
FROM TUMBLE(
    clickstream_events,
    event_time,
    INTERVAL '1 HOUR'
)
WHERE product_id IS NOT NULL
GROUP BY product_id, category_id, WINDOW_START, WINDOW_END;

Merchandising teams use this to spot underperforming products that need better placement and high-converting products that deserve more prominent positioning — in real time, during a sale, not the morning after.


Search Query Analytics

Understanding what shoppers search for is a directional signal for inventory gaps and merchandising opportunities:

CREATE MATERIALIZED VIEW search_analytics_1h AS
SELECT
    LOWER(search_query)          AS search_term,
    COUNT(*)                     AS search_count,
    COUNT(DISTINCT session_id)   AS unique_sessions,
    WINDOW_START,
    WINDOW_END
FROM TUMBLE(
    clickstream_events,
    event_time,
    INTERVAL '1 HOUR'
)
WHERE event_type = 'search'
  AND search_query IS NOT NULL
GROUP BY LOWER(search_query), WINDOW_START, WINDOW_END
ORDER BY search_count DESC;

A search term that jumps from 5 to 500 queries per hour during a news cycle is an inventory signal, not just a search analytics metric.


Anomaly Detection: Traffic Spikes and Drops

Sudden traffic changes — a campaign going viral, a bot attack, or a service outage — show up in clickstream before they show up anywhere else:

CREATE MATERIALIZED VIEW traffic_anomaly_detection AS
WITH current_window AS (
    SELECT
        COUNT(DISTINCT session_id)  AS current_sessions,
        WINDOW_START,
        WINDOW_END
    FROM TUMBLE(clickstream_events, event_time, INTERVAL '5 MINUTES')
    GROUP BY WINDOW_START, WINDOW_END
),
baseline AS (
    SELECT AVG(current_sessions) AS avg_sessions
    FROM current_window
    WHERE WINDOW_END < NOW() - INTERVAL '5 MINUTES'
      AND WINDOW_END > NOW() - INTERVAL '2 HOURS'
)
SELECT
    cw.WINDOW_START,
    cw.WINDOW_END,
    cw.current_sessions,
    b.avg_sessions,
    ROUND((cw.current_sessions - b.avg_sessions) / NULLIF(b.avg_sessions, 0) * 100, 1) AS pct_deviation
FROM current_window cw
CROSS JOIN baseline b
ORDER BY cw.WINDOW_START DESC;

Sessions more than 50% above or below the rolling 2-hour baseline trigger an alert. This kind of real-time anomaly detection catches payment gateway outages (sudden conversion drop) and viral social traffic (sudden session spike) within minutes.


Comparison: Analytics Platform Approaches

DimensionGoogle Analytics 4Data Warehouse + dbtStreaming SQL (RisingWave)
Data freshness24–48 hours30 min–4 hoursSeconds
Raw event accessNoYesYes
Session reconstructionPre-built (limited)Post-processing jobLive materialized view
Abandonment detectionBatch (daily)Batch (hourly at best)Real-time
Custom funnel definitionsLimitedFlexibleFull SQL flexibility
InfrastructureSaaSCloud DW + orchestratorStreaming database
Cost at scalePer-event pricingWarehouse computeStorage on S3

Getting Events into Kafka

Your frontend sends events via a lightweight client library that posts to an ingest API, which writes to Kafka. A minimal setup:

  • Frontend: A 1KB JavaScript snippet that fires events on page load, product view, add-to-cart, and checkout steps
  • Ingest API: A stateless HTTP endpoint (NGINX + a thin Go/Node service) that validates and writes to Kafka
  • Kafka: A single topic clickstream partitioned by session_id to preserve per-session ordering

This architecture handles hundreds of thousands of events per second. The Kafka topic is your durable event log. RisingWave reads from it continuously and maintains all the materialized views described above.


Frequently Asked Questions

Q: How does this handle anonymous users who later log in? Track a session_id cookie for all visitors from first touch. When a user logs in, emit a user_identified event that maps session_id to user_id. A materialized view that joins session_summary on this mapping can retroactively associate the pre-login portion of the session with the identified user.

Q: Does this replace Google Analytics? It is complementary. Google Analytics excels at acquisition attribution and benchmarking against industry data. Streaming SQL on raw clickstream excels at live operational metrics, custom funnel definitions, and real-time interventions like abandonment recovery. Most teams run both.

Q: How do we handle bot traffic? Add a bot filtering step in your ingest API before events reach Kafka — filter known bot user agents and apply rate limiting per IP. You can also create a bot_sessions materialized view in RisingWave to identify sessions with anomalously high event rates (e.g., more than 200 events per minute) and exclude them from aggregations.

Q: What is the storage cost for raw clickstream events? Kafka retains the raw event stream. Storage cost depends on your retention window and event volume. At 10,000 events per minute with a 72-hour retention window, you are storing roughly 1.3 billion events — manageable with Kafka's log compaction and tiered storage. RisingWave's materialized view state lives in S3, which is significantly cheaper than keeping a large Kafka retention.

Q: How long does it take to set up this pipeline? A working pipeline — Kafka source, basic funnel metrics materialized view, and a Postgres-compatible endpoint your BI tool can query — takes a few hours to stand up. The harder work is agreeing on event schema and ensuring your frontend fires events consistently across all pages and devices.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.