Real-Time Clickstream Analytics: Track User Behavior as It Happens

Real-Time Clickstream Analytics: Track User Behavior as It Happens

·

12 min read

Every product team wants to understand what users do on their site. Which pages do they visit? Where do they drop off? How many actually complete a purchase? Traditional analytics pipelines batch this data overnight, which means you find out about problems hours or even days after they happen.

Real-time clickstream analytics changes that equation. By processing user behavior events as they arrive, you can detect funnel drop-offs within seconds, trigger re-engagement campaigns while users are still on your site, and watch conversion rates update live during a product launch. The challenge has always been that building these pipelines required specialized stream processing frameworks with steep learning curves.

In this tutorial, you will build a complete real-time clickstream analytics pipeline using RisingWave, a streaming database that lets you process event streams with standard SQL. You will ingest clickstream events from Kafka, sessionize user activity, compute page view funnels, and detect drop-offs, all with materialized views that stay up to date automatically.

What Is Clickstream Analytics?

Clickstream analytics is the practice of collecting and analyzing the sequence of pages and actions a user takes while navigating a website or application. Each click, page view, scroll, or button tap generates an event, and the ordered sequence of these events forms a "clickstream."

A typical clickstream event contains:

  • User identifier - a cookie ID, logged-in user ID, or anonymous session token
  • Event type - page_view, click, add_to_cart, purchase, scroll, form_submit
  • Page URL - the page where the event occurred
  • Referrer - where the user came from
  • Device and browser metadata - device type, browser, screen resolution
  • Timestamp - when the event happened

Product teams use clickstream data for conversion funnel analysis, A/B test evaluation, personalization, and UX optimization. The difference between batch and real-time clickstream analytics is latency: batch pipelines aggregate data in hourly or daily windows, while streaming pipelines process each event as it arrives.

Architecture: Kafka to RisingWave

A production clickstream pipeline typically follows this architecture:

graph LR
    A[Web/Mobile App] -->|Events| B[Kafka]
    B -->|Stream| C[RisingWave]
    C -->|Materialized Views| D[Dashboard / API]
    C -->|Alerts| E[Notification Service]

Your application emits clickstream events to Apache Kafka, which acts as the durable event log. RisingWave connects to Kafka as a source, ingests the events, and maintains materialized views that continuously compute session metrics, funnel conversions, and drop-off alerts. Downstream systems query RisingWave directly or consume changes via sinks.

Ingesting From Kafka

To connect RisingWave to a Kafka topic containing clickstream events, you create a source with the schema matching your event format:

CREATE SOURCE clickstream_kafka_source (
    event_id VARCHAR,
    user_id VARCHAR,
    session_id VARCHAR,
    event_type VARCHAR,
    page_url VARCHAR,
    referrer_url VARCHAR,
    device_type VARCHAR,
    browser VARCHAR,
    country VARCHAR,
    event_timestamp TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'clickstream-events',
    properties.bootstrap.server = 'your-kafka-broker:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

This DDL tells RisingWave to consume JSON messages from the clickstream-events topic starting from the earliest available offset. Once the source exists, you can build materialized views on top of it that RisingWave keeps incrementally updated as new events arrive.

For the hands-on examples below, we will use a regular table so you can follow along without setting up Kafka. The SQL for materialized views is identical regardless of whether the data comes from a Kafka source or a table.

Hands-On: Building Clickstream Analytics with SQL

All SQL in this section was verified against RisingWave 2.8.0. You can install RisingWave locally with brew install risingwave and connect with any PostgreSQL client.

Step 1: Create the Clickstream Events Table

CREATE TABLE clickstream_events (
    event_id VARCHAR,
    user_id VARCHAR,
    session_id VARCHAR,
    event_type VARCHAR,
    page_url VARCHAR,
    referrer_url VARCHAR,
    device_type VARCHAR,
    browser VARCHAR,
    country VARCHAR,
    event_timestamp TIMESTAMPTZ
);

In production, replace this CREATE TABLE with the CREATE SOURCE shown above to ingest directly from Kafka.

Step 2: Load Sample Data

Let's insert clickstream events representing six user sessions with different behaviors. Some users complete a purchase, some abandon their cart, and some leave early.

INSERT INTO clickstream_events VALUES
('e001','u101','s001','page_view','/home','https://google.com','desktop','Chrome','US','2026-04-01 10:00:00+00'),
('e002','u101','s001','page_view','/products','/home','desktop','Chrome','US','2026-04-01 10:01:30+00'),
('e003','u101','s001','page_view','/products/widget-pro','/products','desktop','Chrome','US','2026-04-01 10:03:00+00'),
('e004','u101','s001','add_to_cart','/products/widget-pro','/products/widget-pro','desktop','Chrome','US','2026-04-01 10:03:45+00'),
('e005','u101','s001','page_view','/checkout','/products/widget-pro','desktop','Chrome','US','2026-04-01 10:04:30+00'),
('e006','u101','s001','purchase','/checkout/confirm','/checkout','desktop','Chrome','US','2026-04-01 10:06:00+00'),
('e007','u102','s002','page_view','/home','https://twitter.com','mobile','Safari','UK','2026-04-01 10:00:30+00'),
('e008','u102','s002','page_view','/products','/home','mobile','Safari','UK','2026-04-01 10:02:00+00'),
('e009','u102','s002','page_view','/products/widget-pro','/products','mobile','Safari','UK','2026-04-01 10:04:00+00'),
('e010','u102','s002','add_to_cart','/products/widget-pro','/products/widget-pro','mobile','Safari','UK','2026-04-01 10:04:30+00'),
('e011','u103','s003','page_view','/home','https://google.com','desktop','Firefox','DE','2026-04-01 10:01:00+00'),
('e012','u103','s003','page_view','/pricing','/home','desktop','Firefox','DE','2026-04-01 10:02:30+00'),
('e013','u103','s003','page_view','/docs','/pricing','desktop','Firefox','DE','2026-04-01 10:05:00+00'),
('e014','u104','s004','page_view','/home','https://linkedin.com','tablet','Chrome','US','2026-04-01 10:00:00+00'),
('e015','u104','s004','page_view','/products','/home','tablet','Chrome','US','2026-04-01 10:01:00+00'),
('e016','u104','s004','page_view','/products/widget-pro','/products','tablet','Chrome','US','2026-04-01 10:02:00+00'),
('e017','u104','s004','page_view','/checkout','/products/widget-pro','tablet','Chrome','US','2026-04-01 10:03:00+00'),
('e018','u105','s005','page_view','/home','https://google.com','mobile','Chrome','JP','2026-04-01 10:02:00+00'),
('e019','u105','s005','page_view','/products','/home','mobile','Chrome','JP','2026-04-01 10:03:30+00'),
('e020','u106','s006','page_view','/home','https://facebook.com','desktop','Chrome','US','2026-04-01 10:05:00+00'),
('e021','u106','s006','page_view','/products','/home','desktop','Chrome','US','2026-04-01 10:06:00+00'),
('e022','u106','s006','page_view','/products/widget-pro','/products','desktop','Chrome','US','2026-04-01 10:07:00+00'),
('e023','u106','s006','add_to_cart','/products/widget-pro','/products/widget-pro','desktop','Chrome','US','2026-04-01 10:07:30+00'),
('e024','u106','s006','page_view','/checkout','/products/widget-pro','desktop','Chrome','US','2026-04-01 10:08:00+00'),
('e025','u106','s006','purchase','/checkout/confirm','/checkout','desktop','Chrome','US','2026-04-01 10:09:00+00');

Step 3: Sessionize Users with a Materialized View

Session analysis is one of the first things product teams want. How long do sessions last? How many pages does a user view? Do they add items to their cart?

This materialized view continuously computes per-session metrics:

CREATE MATERIALIZED VIEW mv_realtime_sessions AS
SELECT
    session_id,
    user_id,
    device_type,
    browser,
    country,
    MIN(event_timestamp) AS session_start,
    MAX(event_timestamp) AS session_end,
    COUNT(*) AS total_events,
    COUNT(*) FILTER (WHERE event_type = 'page_view') AS page_views,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS add_to_carts,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases
FROM clickstream_events
GROUP BY session_id, user_id, device_type, browser, country;

Query it to see session-level summaries:

SELECT * FROM mv_realtime_sessions ORDER BY session_id;
 session_id | user_id | device_type | browser | country |       session_start       |        session_end        | total_events | page_views | add_to_carts | purchases
------------+---------+-------------+---------+---------+---------------------------+---------------------------+--------------+------------+--------------+-----------
 s001       | u101    | desktop     | Chrome  | US      | 2026-04-01 10:00:00+00:00 | 2026-04-01 10:06:00+00:00 |            6 |          4 |            1 |         1
 s002       | u102    | mobile      | Safari  | UK      | 2026-04-01 10:00:30+00:00 | 2026-04-01 10:04:30+00:00 |            4 |          3 |            1 |         0
 s003       | u103    | desktop     | Firefox | DE      | 2026-04-01 10:01:00+00:00 | 2026-04-01 10:05:00+00:00 |            3 |          3 |            0 |         0
 s004       | u104    | tablet      | Chrome  | US      | 2026-04-01 10:00:00+00:00 | 2026-04-01 10:03:00+00:00 |            4 |          4 |            0 |         0
 s005       | u105    | mobile      | Chrome  | JP      | 2026-04-01 10:02:00+00:00 | 2026-04-01 10:03:30+00:00 |            2 |          2 |            0 |         0
 s006       | u106    | desktop     | Chrome  | US      | 2026-04-01 10:05:00+00:00 | 2026-04-01 10:09:00+00:00 |            6 |          4 |            1 |         1

You can immediately see that sessions s001 and s006 completed a purchase, while s002 added to cart but never bought. Because this is a materialized view, RisingWave updates these results incrementally as new events arrive, with no re-scanning of historical data.

Step 4: Compute a Page View Funnel

Funnel analysis answers the question: at each step in the conversion path, how many users proceed to the next step? This materialized view computes a six-stage e-commerce funnel:

CREATE MATERIALIZED VIEW mv_funnel_analysis AS
WITH session_flags AS (
    SELECT
        session_id,
        bool_or(page_url = '/products') AS saw_products,
        bool_or(page_url LIKE '/products/%' AND page_url != '/products')
            AS saw_product_detail,
        bool_or(event_type = 'add_to_cart') AS did_add_to_cart,
        bool_or(page_url = '/checkout') AS reached_checkout,
        bool_or(event_type = 'purchase') AS did_purchase
    FROM clickstream_events
    GROUP BY session_id
)
SELECT
    COUNT(*) AS total_sessions,
    COUNT(*) FILTER (WHERE saw_products) AS viewed_products,
    COUNT(*) FILTER (WHERE saw_product_detail) AS viewed_product_detail,
    COUNT(*) FILTER (WHERE did_add_to_cart) AS added_to_cart,
    COUNT(*) FILTER (WHERE reached_checkout) AS reached_checkout,
    COUNT(*) FILTER (WHERE did_purchase) AS completed_purchase
FROM session_flags;

The CTE first flags each session for whether it hit each funnel stage using bool_or, then the outer query counts how many sessions reached each stage. Query the results:

SELECT * FROM mv_funnel_analysis;
 total_sessions | viewed_products | viewed_product_detail | added_to_cart | reached_checkout | completed_purchase
----------------+-----------------+-----------------------+---------------+------------------+--------------------
              6 |               5 |                     4 |             3 |                3 |                  2

Reading this funnel: of 6 total sessions, 5 viewed the products page (83%), 4 viewed a product detail page (67%), 3 added to cart (50%), 3 reached checkout (50%), and 2 completed a purchase (33%). The biggest drop-off is from product detail to add-to-cart, which suggests the product page may need a stronger call to action.

Step 5: Track Page Popularity

Understanding which pages attract the most traffic helps prioritize optimization efforts:

CREATE MATERIALIZED VIEW mv_page_popularity AS
SELECT
    page_url,
    COUNT(*) AS total_views,
    COUNT(DISTINCT user_id) AS unique_visitors,
    COUNT(DISTINCT session_id) AS unique_sessions
FROM clickstream_events
WHERE event_type = 'page_view'
GROUP BY page_url;
SELECT * FROM mv_page_popularity ORDER BY total_views DESC;
       page_url       | total_views | unique_visitors | unique_sessions
----------------------+-------------+-----------------+-----------------
 /home                |           6 |               6 |               6
 /products            |           5 |               5 |               5
 /products/widget-pro |           4 |               4 |               4
 /checkout            |           3 |               3 |               3
 /pricing             |           1 |               1 |               1
 /docs                |           1 |               1 |               1

Every session starts at /home, and the traffic naturally narrows as users progress through the funnel. The /pricing and /docs pages show organic exploration by users who took a different path than the main conversion flow.

Step 6: Detect Drop-Offs in Real Time

This is where streaming analytics really shows its value. Instead of discovering cart abandonment in yesterday's batch report, you can flag it the moment it happens and trigger an immediate response.

CREATE MATERIALIZED VIEW mv_dropoff_alerts AS
WITH last_pages AS (
    SELECT DISTINCT ON (session_id)
        session_id,
        page_url AS last_page
    FROM clickstream_events
    WHERE event_type = 'page_view'
    ORDER BY session_id, event_timestamp DESC
),
session_flags AS (
    SELECT
        session_id,
        user_id,
        device_type,
        bool_or(event_type = 'add_to_cart') AS added_to_cart,
        bool_or(event_type = 'purchase') AS completed_purchase
    FROM clickstream_events
    GROUP BY session_id, user_id, device_type
)
SELECT
    sf.session_id,
    sf.user_id,
    sf.device_type,
    lp.last_page,
    CASE
        WHEN sf.added_to_cart AND NOT sf.completed_purchase
            THEN 'cart_abandonment'
        WHEN lp.last_page = '/checkout' AND NOT sf.completed_purchase
            THEN 'checkout_abandonment'
        WHEN lp.last_page IN ('/products', '/products/widget-pro')
            AND NOT sf.added_to_cart
            THEN 'browse_abandonment'
        ELSE 'early_exit'
    END AS dropoff_type
FROM session_flags sf
JOIN last_pages lp ON sf.session_id = lp.session_id
WHERE NOT sf.completed_purchase;

This view combines two CTEs. The first uses DISTINCT ON to find the last page each session viewed. The second flags whether the session added to cart or purchased. The outer query classifies each incomplete session by its drop-off type.

SELECT * FROM mv_dropoff_alerts ORDER BY session_id;
 session_id | user_id | device_type |      last_page       |     dropoff_type
------------+---------+-------------+----------------------+----------------------
 s002       | u102    | mobile      | /products/widget-pro | cart_abandonment
 s003       | u103    | desktop     | /docs                | early_exit
 s004       | u104    | tablet      | /checkout            | checkout_abandonment
 s005       | u105    | mobile      | /products            | browse_abandonment

Four sessions did not complete a purchase, and each has a different drop-off classification:

  • s002: Added to cart on mobile Safari but never proceeded to checkout (cart abandonment)
  • s003: Browsed the pricing and docs pages but never looked at products (early exit)
  • s004: Reached the checkout page on a tablet but did not complete payment (checkout abandonment)
  • s005: Viewed the products listing page on mobile but never clicked into a product (browse abandonment)

In a production system, you can create a RisingWave sink to push changes from mv_dropoff_alerts into a downstream system like Kafka, a webhook, or a database. For example, when a new cart_abandonment row appears, your notification service could trigger an email or push notification within seconds.

Why Use a Streaming Database for Clickstream Analytics?

Traditional approaches to clickstream analytics involve either batch ETL pipelines (Spark, Airflow) or custom stream processing applications (Flink, Kafka Streams). A streaming database like RisingWave offers a different tradeoff:

AspectBatch (Spark/Airflow)Stream Processing (Flink)Streaming Database (RisingWave)
LatencyMinutes to hoursSub-secondSub-second
Query languageSQL + custom DAGsSQL or Java/ScalaStandard PostgreSQL SQL
State managementExternal (S3, HDFS)Built-in, complex tuningBuilt-in, automatic
Serving layer needed?Yes (separate DB)Yes (separate DB)No, query directly
Operational complexityHigh (multiple systems)High (JVM tuning, checkpoints)Low (single system)

RisingWave acts as both the stream processor and the serving layer. You write SQL to define your computations as materialized views, and RisingWave keeps them incrementally updated. Any PostgreSQL-compatible client can query the results directly, which eliminates the need for a separate database between your stream processor and your dashboards.

For clickstream analytics specifically, this means:

  • No separate sessionization job: The mv_realtime_sessions materialized view replaces what would otherwise be a Flink job or Spark task
  • No ETL to a serving database: Grafana, Metabase, or your custom dashboard can query RisingWave directly using its PostgreSQL-compatible interface
  • Simpler operations: One system to monitor instead of Kafka + Flink + PostgreSQL + Redis

What Is the Difference Between Clickstream Analytics and Web Analytics?

Web analytics tools like Google Analytics provide pre-built reports on traffic, bounce rates, and conversions. Clickstream analytics gives you raw, event-level data that you fully control. The key differences are flexibility and ownership.

With clickstream analytics, you define your own funnel stages, sessionization logic, and attribution rules. You can join clickstream data with backend events (payment completions, API errors, feature flags) that web analytics tools never see. You also own the data, which matters for compliance with privacy regulations like GDPR.

Web analytics is a good starting point for most teams. Clickstream analytics becomes essential when you need custom funnel definitions, real-time alerting, or integration with internal systems.

How Does RisingWave Handle Late-Arriving Clickstream Events?

Late-arriving events are common in clickstream data, especially from mobile devices that may lose connectivity temporarily. RisingWave handles this through its incremental computation model: when a late event arrives, the affected materialized views are automatically updated to reflect the new data.

For example, if a mobile user's add_to_cart event arrives 30 seconds late, the mv_realtime_sessions view will update the session's add_to_carts count, and the mv_dropoff_alerts view will reclassify the session from browse_abandonment to the correct category. This happens automatically without any special configuration.

For use cases that require strict event ordering or watermark-based windowing, RisingWave supports temporal filters and watermarks that let you control how late data is handled.

Can I Use RisingWave for Clickstream Analytics at Scale?

RisingWave is designed for production workloads. It supports horizontal scaling by distributing computation across multiple nodes, and its storage layer uses a shared-storage architecture that separates compute from storage for independent scaling.

For clickstream analytics, the key scaling factors are:

  • Event throughput: RisingWave can ingest millions of events per second from Kafka
  • Number of concurrent sessions: Materialized views partition state by group key (e.g., session_id), so more sessions means more distributed state
  • Query concurrency: Since materialized views precompute results, read queries are fast regardless of the underlying data volume

Teams processing billions of clickstream events daily use RisingWave in production. You can start with a single node for development and scale out as your traffic grows. RisingWave Cloud handles the scaling automatically if you prefer a managed service.

Conclusion

Real-time clickstream analytics transforms user behavior data from a historical reporting tool into an operational asset. With streaming SQL, you can:

  • Sessionize users in real time by aggregating events with materialized views that update incrementally
  • Compute conversion funnels using bool_or flags and FILTER clauses to track progression through funnel stages
  • Detect drop-offs instantly by classifying incomplete sessions and routing alerts to downstream systems
  • Eliminate pipeline complexity by using a streaming database that serves as both processor and query engine

The SQL patterns shown here, including DISTINCT ON for last-page detection, bool_or for session flags, and FILTER for conditional aggregation, are building blocks you can adapt to any clickstream schema.


Ready to build your own clickstream pipeline? Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.