In e-commerce, the window of opportunity is measured in seconds. A customer showing strong buying intent might leave your site moments later. A high-value shopper might become dormant after a single bad experience. Relying on daily or hourly batch reports to understand these changes is like trying to navigate a racetrack by looking in the rearview mirror.
To win, businesses need to operate on their customers' time: right now.
This requires a shift from static analysis to a proactive, event-driven approach. Instead of asking "Who were our most valuable customers yesterday?", we need to be able to react the instant a user's behavior changes their segment. This is the promise of real-time segmentation.
This post walks through a demo that shows you how to build the engine for this capability, using the power of a streaming database like RisingWave and Apache Kafka.
The Goal: An Engine for Proactive Engagement
Before diving into the tech, let's be clear about the goal. We are building a system that continuously and instantly categorizes every user into a specific segment based on their latest actions.
This isn't just for a dashboard. This real-time segmentation feed becomes a source of truth that can trigger a multitude of business actions:
Automated Marketing Triggers: A user moves into the "High Potential" segment? Instantly send them a targeted email with a limited-time offer.
Dynamic Personalization: A "Dormant High-Value" customer logs in? Immediately personalize the homepage to showcase products they've previously purchased or viewed.
Smarter Ad Spend: A user is flagged as "Churned"? Automatically remove them from expensive re-targeting campaigns to improve ROI.
Proactive Customer Support: An "Active Customer" starts a support chat? Flag their ticket with high priority and route it to a senior agent.
A Hybrid Approach to Segmentation
To make these segments meaningful, our demo combines two powerful models:
RFM Model (Historical Value): This classic framework scores users based on their long-term value, looking at the Recency, Frequency, and Monetary value of their past purchases.
Behavioral Patterns (Real-Time Intent): This model provides an immediate pulse on what a user is doing right now, tracking their
view_count,cart_count, andpurchase_countwithin a recent window.
By blending historical context with real-time behavior, we can create nuanced segments like "Dormant High-Value" (high historical RFM, low current activity) or "High Potential" (high current activity, low purchase history).
Architecture of the Real-Time Engine

The demo is built on a modern, scalable streaming stack where each component has a distinct role:
Apache Kafka: The event streaming platform. Kafka acts as the central nervous system for our data, reliably capturing streams of raw user events (clicks, views, orders) and making them available for processing in real-time. It decouples the data producers from the data consumers.
RisingWave: The real-time processing engine. RisingWave connects to Kafka, ingests these event streams, and acts as the 'brain' of the operation. It uses Materialized Views to continuously maintain the complex calculations and segmentation logic, keeping the state of every user segment perpetually up-to-date.
Python/Flask: A lightweight service layer that simulates data and exposes the results from RisingWave. In a production scenario, this layer would be responsible for sending segmentation events to downstream marketing or operational systems.
Dashboard: A simple visualization tool to prove the engine is working and to see segment counts change in real-time.
Build the Engine Yourself: A Step-by-Step Guide
The demo is fully self-contained. The instructions in the GitHub repository will guide you through cloning the project and running a script to get it up and running in minutes.
Under the Hood: How Streaming SQL Powers the Logic
The elegance of this solution lies in expressing complex, stateful segmentation logic in simple SQL. Unlike a traditional database where a query is run once, RisingWave's queries are stateful and continuously update their results as new data arrives. Let's break down how it works.
Step 1: Connect to the Data Streams
First, we tell RisingWave to connect to our Kafka topics. The CREATE SOURCE command defines the connection and schema but doesn't copy the data.
-- User behavior source
CREATE SOURCE event_stream (
user_id VARCHAR,
event_id VARCHAR,
event_type VARCHAR,
product_id VARCHAR,
product_name VARCHAR,
price DOUBLE PRECISION,
event_time TIMESTAMP,
page_url VARCHAR
) WITH (
connector = 'kafka',
topic = 'clickstream',
properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
-- Orders source
CREATE SOURCE order_stream (
order_id VARCHAR,
user_id VARCHAR,
event_id VARCHAR,
order_status VARCHAR,
price DOUBLE PRECISION,
order_time TIMESTAMP
) WITH (
connector = 'kafka',
topic = 'order_events',
properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
Step 2: Calculate Real-Time Statistics with Materialized Views
This is where the magic happens. We use CREATE MATERIALIZED VIEW to perform aggregations. In RisingWave, a materialized view is not a static snapshot. It's a query whose results are incrementally and efficiently updated in real-time as new data flows in.
-- Calculate user behavior statistics
CREATE MATERIALIZED VIEW click_behavior_stats AS
SELECT
user_id,
COUNT(*) FILTER (WHERE event_type = 'view') AS view_count,
COUNT(*) FILTER (WHERE event_type = 'cart') AS cart_count,
COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count
FROM event_stream
WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL '2 months'
GROUP BY user_id;
-- Calculate order status statistics
CREATE MATERIALIZED VIEW user_order_stats AS
SELECT
1 AS id,
user_id,
COUNT(*) FILTER (WHERE order_status = 'completed') AS completed_orders,
SUM(price) AS total_spent,
MAX(order_time) AS last_order_time
FROM order_stream
GROUP BY user_id;
-- Calculate global max order time
CREATE MATERIALIZED VIEW global_order_stats AS
SELECT
1 AS id,
MAX(order_time) AS max_order_time
FROM order_stream;
CREATE MATERIALIZED VIEW order_behavior_stats AS
SELECT
u.user_id AS user_id,
u.completed_orders AS completed_orders,
u.total_spent AS total_spent,
u.last_order_time AS last_order_time,
g.max_order_time AS max_order_time
FROM user_order_stats u
JOIN global_order_stats g
ON u.id = g.id;
A key pattern to notice here is the use of global_order_stats. To calculate recency for each user, we need to compare their last_order_time with the most recent order time across the entire system (max_order_time). The global_order_stats view efficiently maintains this single value. We then join it back to each user's statistics, which is a necessary pattern in streaming SQL to perform this type of comparison efficiently.
The temporal filter WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL '2 months' is especially important. In a streaming context, this defines a rolling 2-month time window. RisingWave continuously maintains this window, automatically including new events and, just as importantly, expiring old events that no longer meet the time criteria. This ensures the results always reflect recent behavior and prevents the state from growing indefinitely.
Step 3: Score User Behavior and Value
With our statistical building blocks in place, we can now create materialized views that apply our scoring logic. These views read from the views created in Step 2 and will update automatically whenever their underlying statistics change. First, we score the RFM (Recency, Frequency, Monetary) dimensions:
CREATE MATERIALIZED VIEW rfm_scores AS
SELECT
user_id,
-- Recency Score: Recent purchases get higher scores
CASE
WHEN max_order_time - last_order_time <= INTERVAL '3 days' THEN 5
WHEN max_order_time - last_order_time <= INTERVAL '10 days' THEN 4
WHEN max_order_time - last_order_time <= INTERVAL '30 days' THEN 3
WHEN max_order_time - last_order_time <= INTERVAL '60 days' THEN 2
ELSE 1
END AS r_score,
-- Frequency Score: Higher purchase frequency gets higher scores
CASE
WHEN completed_orders >= 15 THEN 5
WHEN completed_orders >= 8 THEN 4
WHEN completed_orders >= 5 THEN 3
WHEN completed_orders >= 2 THEN 2
ELSE 1
END AS f_score,
-- Monetary Score: Higher spending gets higher scores
CASE
WHEN total_spent >= 5000 THEN 5
WHEN total_spent >= 2500 THEN 4
WHEN total_spent >= 1000 THEN 3
WHEN total_spent >= 200 THEN 2
ELSE 1
END AS m_score
FROM order_behavior_stats;
Next, we score the real-time behavioral patterns in a similar fashion:
CREATE MATERIALIZED VIEW behavior_scores AS
SELECT
u.user_id,
-- Browsing Activity Score
CASE
WHEN view_count >= 30 THEN 5
...
END AS browse_score,
-- Purchase Intent Score
CASE
WHEN cart_count >= 10 THEN 5
...
END AS intent_score
FROM click_behavior_stats u;
Step 4: Define the Final Segments with a Streaming Join
Finally, we join the results from our behavioral and RFM scoring views to produce the final customer segment. Because all the underlying views are continuously updated, this streaming LEFT JOIN is also maintained in real-time.
CREATE MATERIALIZED VIEW customer_segmentation AS
SELECT
b.user_id,
b.browse_score AS browse_score,
b.intent_score AS intent_score,
COALESCE(r.r_score, 1) AS r_score,
COALESCE(r.f_score, 1) AS f_score,
COALESCE(r.m_score, 1) AS m_score,
CASE
-- High-value active customers (high RFM + high activity)
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4
AND browse_score >= 3 THEN 'ACTIVE'
-- New customers (first purchase within 7 days)
WHEN r_score >= 4 AND f_score = 1 THEN 'NEW'
-- High-potential customers (high browse/cart but no purchase)
WHEN browse_score >= 4 AND intent_score >= 3 AND COALESCE(r_score, 1) <= 2 THEN 'HIGH_POTENTIAL'
-- Dormant high-value customers (low activity but high historical value)
WHEN COALESCE(r_score, 1) <= 2 AND m_score >= 4 THEN 'DORMANT_HIGH_VALUE'
-- Average active customers
WHEN r_score >= 2 AND f_score >= 2 AND m_score >= 2 THEN 'AVERAGE_ACTIVE'
-- Churned customers
WHEN COALESCE(r_score, 1) <= 2 AND COALESCE(f_score, 1) <= 2 THEN 'CHURNED'
ELSE 'OTHER'
END AS segment
FROM behavior_scores b
LEFT JOIN rfm_scores r
ON r.user_id = b.user_id;
Note the use of COALESCE(r.r_score, 1). This is important because the LEFT JOIN ensures we include users who have browsing behavior but have never made a purchase. For these users, the RFM scores will be NULL. COALESCE allows us to assign them a default low score of 1, ensuring they can still be segmented correctly (e.g., as "High Potential").
This final customer_segmentation view is the core output of our engine. It contains the most up-to-date segment for every single user, ready to be queried by any downstream application.
Step 5: Querying and Visualizing the Results
Once the engine is built, you can query the final materialized view to power applications. The demo includes a simple Python script that queries the view to aggregate the total count for each segment and sends it to the dashboard.
SELECT count(*), segment
FROM customer_segmentation
GROUP by segment
ORDER BY segment;

Conclusion: From Insight to Action
This demo provides a tangible blueprint for moving beyond static reports and building a truly proactive e-commerce business. By leveraging a real-time streaming stack, you can create a powerful segmentation engine that closes the gap between insight and action.
The dashboard is just the beginning. The real question is: what will you build with it?
We encourage you to check out the GitHub repository, where you can run the demo yourself, experiment with the segmentation logic, and start thinking about how to connect this real-time intelligence into your own operational systems.
Get Started with RisingWave
Try RisingWave Today:
Download the open-sourced version of RisingWave to deploy on your own infrastructure.
Get started quickly with RisingWave Cloud for a fully managed experience.
Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.
Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.
If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.

