Building a Real-Time Recommendation Engine Without a Data Science Team
Most product recommendation engines require a machine learning team, a batch training pipeline, and a serving layer that is always a few hours stale. The core signals — what products people buy together, what is trending right now, what a specific customer gravitates toward — can be maintained incrementally with standard SQL on a streaming database. No Python, no Spark, no model retraining cycles.
The Problem with Batch Recommendations
A recommendation engine trained overnight on yesterday's data misses the most valuable signal: what is happening right now. A product going viral this afternoon will not appear in "trending" recommendations until tomorrow's batch run. A user who just bought a tent will still see tent recommendations for the next several hours.
Batch pipelines are not slow because the engineers built them poorly. They are slow because the underlying data infrastructure is not designed to process events incrementally. Every recommendation refresh requires re-scanning the entire purchase history, recomputing co-purchase matrices, and re-ranking results.
The alternative is to treat every purchase, click, and view as an event that incrementally updates the recommendation signals — in SQL, without any retraining loop.
What You Actually Need for Basic Recommendations
The most effective e-commerce recommendation strategies boil down to a handful of signals:
- Co-purchase affinity — products frequently bought together in the same order
- Trending products — products with high purchase velocity in the last 1–24 hours
- User affinity scores — categories or brands a specific user has purchased or viewed recently
- "Customers also viewed" — session-level co-views
All four can be maintained as streaming SQL materialized views. The views update incrementally as events arrive, so the signals are always current.
Setting Up Event Sources in RisingWave
RisingWave is a PostgreSQL-compatible streaming database, open source under the Apache 2.0 license, built in Rust, and backed by S3 for persistent storage. It reads from Kafka natively and exposes results as queryable materialized views through a standard PostgreSQL interface.
-- Purchase events: fired when an order is confirmed
CREATE SOURCE purchase_events (
order_id VARCHAR,
user_id VARCHAR,
product_id VARCHAR,
category_id VARCHAR,
brand_id VARCHAR,
price NUMERIC,
event_time TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'purchase_events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
-- View events: fired when a product detail page is loaded
CREATE SOURCE view_events (
session_id VARCHAR,
user_id VARCHAR,
product_id VARCHAR,
category_id VARCHAR,
event_time TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'product_view_events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
Signal 1: Trending Products (Rolling Window)
What is selling right now — not yesterday, not this week, but in the last four hours:
CREATE MATERIALIZED VIEW trending_products_4h AS
SELECT
product_id,
category_id,
COUNT(*) AS purchase_count,
SUM(price) AS revenue,
WINDOW_START AS window_start,
WINDOW_END AS window_end
FROM TUMBLE(
purchase_events,
event_time,
INTERVAL '4 HOURS'
)
GROUP BY
product_id,
category_id,
WINDOW_START,
WINDOW_END
ORDER BY purchase_count DESC;
For a sliding window — "trending in the last 4 hours, updated every 10 minutes":
CREATE MATERIALIZED VIEW trending_products_sliding AS
SELECT
product_id,
COUNT(*) AS purchase_count,
WINDOW_START,
WINDOW_END
FROM HOP(
purchase_events,
event_time,
INTERVAL '10 MINUTES',
INTERVAL '4 HOURS'
)
GROUP BY product_id, WINDOW_START, WINDOW_END;
Your recommendation service queries trending_products_sliding and gets current results with sub-second latency, always reflecting purchases from the last four hours.
Signal 2: Co-Purchase Affinity (What Is Bought Together)
Co-purchase affinity requires pairing products that appear in the same order. This is a self-join on the purchase events grouped by order_id:
CREATE MATERIALIZED VIEW co_purchase_affinity AS
SELECT
a.product_id AS product_a,
b.product_id AS product_b,
COUNT(DISTINCT a.order_id) AS co_purchase_count
FROM purchase_events a
JOIN purchase_events b
ON a.order_id = b.order_id
AND a.product_id < b.product_id -- avoid duplicate pairs
GROUP BY a.product_id, b.product_id;
This view updates incrementally. When a new order arrives containing products P1 and P2, RisingWave increments the co_purchase_count for the (P1, P2) pair without rescanning historical data.
To serve "frequently bought together" recommendations:
SELECT
product_b AS recommended_product_id,
co_purchase_count
FROM co_purchase_affinity
WHERE product_a = 'SKU-88412'
ORDER BY co_purchase_count DESC
LIMIT 5;
Signal 3: User Affinity Scores
Track each user's purchase behavior by category and brand to surface personalized recommendations:
CREATE MATERIALIZED VIEW user_category_affinity AS
SELECT
user_id,
category_id,
COUNT(*) AS purchase_count,
SUM(price) AS total_spend,
MAX(event_time) AS last_purchase_at
FROM purchase_events
GROUP BY user_id, category_id;
CREATE MATERIALIZED VIEW user_brand_affinity AS
SELECT
user_id,
brand_id,
COUNT(*) AS purchase_count,
MAX(event_time) AS last_purchase_at
FROM purchase_events
GROUP BY user_id, brand_id;
When a user loads their homepage, query these views to determine which categories and brands to feature:
-- Top 3 categories for a specific user
SELECT category_id, purchase_count
FROM user_category_affinity
WHERE user_id = 'user_29847'
ORDER BY purchase_count DESC
LIMIT 3;
Because these views are incrementally maintained, they reflect the user's most recent purchase immediately — not after the next batch run.
Signal 4: Session Co-Views ("Customers Also Viewed")
Co-views within a session are a powerful signal for "customers also viewed" carousels. Products viewed in the same session tend to be substitutes or complements:
CREATE MATERIALIZED VIEW session_coview_affinity AS
SELECT
a.product_id AS product_a,
b.product_id AS product_b,
COUNT(DISTINCT a.session_id) AS co_view_count
FROM view_events a
JOIN view_events b
ON a.session_id = b.session_id
AND a.product_id < b.product_id
GROUP BY a.product_id, b.product_id;
This is the same pattern as co-purchase affinity but uses session-level co-views rather than order-level co-purchases. Co-view signals update in real time as visitors browse.
Combining Signals into a Recommendation Score
A simple linear combination of signals gives you a blended recommendation score without needing a trained model:
CREATE MATERIALIZED VIEW blended_recommendations AS
SELECT
cp.product_a AS anchor_product,
cp.product_b AS recommended_product,
(cp.co_purchase_count * 3.0
+ COALESCE(cv.co_view_count, 0) * 1.0
) AS recommendation_score
FROM co_purchase_affinity cp
LEFT JOIN session_coview_affinity cv
ON cp.product_a = cv.product_a
AND cp.product_b = cv.product_b
ORDER BY anchor_product, recommendation_score DESC;
Adjust the weights (3.0 for purchase, 1.0 for view) based on your conversion data. The entire recommendation signal set updates continuously, with no model training loop.
Comparison: Batch vs. Streaming Recommendation Signals
| Dimension | Nightly Batch ML | Hourly Refresh | Streaming SQL (RisingWave) |
| Signal freshness | 12–24 hours | 1 hour | Seconds |
| Captures today's trends | No | Partial | Yes |
| Reflects user's latest purchase | Next day | Within 1 hour | Immediately |
| Infrastructure required | ML platform, Spark/Airflow | Data warehouse + ETL | Streaming database |
| Team required | Data science + ML ops | Data engineering | Any SQL engineer |
| Recommendation logic in | Python/notebook | SQL + scheduler | SQL only |
Serving Recommendations to Your Application
Because RisingWave exposes a PostgreSQL wire protocol, your application queries recommendation signals the same way it queries any database:
-- Fetch "frequently bought together" for a product page
SELECT r.recommended_product, p.name, p.price, r.recommendation_score
FROM blended_recommendations r
JOIN product_catalog p ON r.recommended_product = p.product_id
WHERE r.anchor_product = $1
ORDER BY r.recommendation_score DESC
LIMIT 8;
Latency is sub-millisecond because the view is pre-computed and indexed. This is read performance you cannot get from a streaming job that computes on the fly.
What Streaming SQL Cannot Replace
Streaming SQL recommendation signals handle the cold-hard business logic well — what is selling, what people buy together, what this user has shown interest in. It does not replace deep learning models that understand semantic product similarity (a user who bought hiking boots might want trail mix, even if those two products never appeared in the same order).
For most mid-market e-commerce operations, the streaming SQL signals outperform complex ML models in practice because they reflect current behavior. Save the ML investment for semantic product understanding and ranking refinement, not for signals that SQL can compute more reliably in real time.
Frequently Asked Questions
Q: Does this approach require any machine learning? No. The signals described here — co-purchases, trending products, user affinity — are aggregate statistics maintained with SQL GROUP BY operations. No model training is required. The signals are interpretable and auditable by anyone who reads SQL.
Q: How does this handle a new product with no purchase history?
New products will not appear in co-purchase or co-view signals until they accumulate interactions. The standard solution is to surface new products in a "new arrivals" slot powered by the trending_products_4h view, which will pick them up as soon as they start selling.
Q: What if two users simultaneously buy the same product? Will the co-purchase view double-count?
No. Each order event has a unique order_id. The COUNT(DISTINCT order_id) in the co-purchase view counts distinct orders, not raw event counts. Concurrent purchases from different orders correctly increment the co-purchase count once per order.
Q: How many products can this scale to?
The co-purchase affinity materialized view grows quadratically with SKU count in the worst case (every product paired with every other). In practice, most products are bought together with a sparse subset of the catalog. RisingWave handles millions of product pairs without issue; beyond that, you can prune the view to retain only pairs with co_purchase_count > N.
Q: Can I add a decay function so older co-purchases matter less? Yes. Replace the simple count with a time-weighted sum using a sliding window. Use a HOP window on purchase events so that only purchases within the last 90 days contribute to the affinity score. As older purchases age out of the window, their contribution to the score drops automatically.

