Building a Real-Time Recommendation Engine Without a Data Science Team

Most product recommendation engines require a machine learning team, a batch training pipeline, and a serving layer that is always a few hours stale. The core signals — what products people buy together, what is trending right now, what a specific customer gravitates toward — can be maintained incrementally with standard SQL on a streaming database. No Python, no Spark, no model retraining cycles.

The Problem with Batch Recommendations

A recommendation engine trained overnight on yesterday's data misses the most valuable signal: what is happening right now. A product going viral this afternoon will not appear in "trending" recommendations until tomorrow's batch run. A user who just bought a tent will still see tent recommendations for the next several hours.

Batch pipelines are not slow because the engineers built them poorly. They are slow because the underlying data infrastructure is not designed to process events incrementally. Every recommendation refresh requires re-scanning the entire purchase history, recomputing co-purchase matrices, and re-ranking results.

The alternative is to treat every purchase, click, and view as an event that incrementally updates the recommendation signals — in SQL, without any retraining loop.

What You Actually Need for Basic Recommendations

The most effective e-commerce recommendation strategies boil down to a handful of signals:

Co-purchase affinity — products frequently bought together in the same order
Trending products — products with high purchase velocity in the last 1–24 hours
User affinity scores — categories or brands a specific user has purchased or viewed recently
"Customers also viewed" — session-level co-views

All four can be maintained as streaming SQL materialized views. The views update incrementally as events arrive, so the signals are always current.

Setting Up Event Sources in RisingWave

RisingWave is a PostgreSQL-compatible streaming database, open source under the Apache 2.0 license, built in Rust, and backed by S3 for persistent storage. It reads from Kafka natively and exposes results as queryable materialized views through a standard PostgreSQL interface.

-- Purchase events: fired when an order is confirmed
CREATE SOURCE purchase_events (
    order_id     VARCHAR,
    user_id      VARCHAR,
    product_id   VARCHAR,
    category_id  VARCHAR,
    brand_id     VARCHAR,
    price        NUMERIC,
    event_time   TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'purchase_events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- View events: fired when a product detail page is loaded
CREATE SOURCE view_events (
    session_id   VARCHAR,
    user_id      VARCHAR,
    product_id   VARCHAR,
    category_id  VARCHAR,
    event_time   TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'product_view_events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

What is selling right now — not yesterday, not this week, but in the last four hours:

CREATE MATERIALIZED VIEW trending_products_4h AS
SELECT
    product_id,
    category_id,
    COUNT(*)                               AS purchase_count,
    SUM(price)                             AS revenue,
    WINDOW_START                           AS window_start,
    WINDOW_END                             AS window_end
FROM TUMBLE(
    purchase_events,
    event_time,
    INTERVAL '4 HOURS'
)
GROUP BY
    product_id,
    category_id,
    WINDOW_START,
    WINDOW_END
ORDER BY purchase_count DESC;

For a sliding window — "trending in the last 4 hours, updated every 10 minutes":

CREATE MATERIALIZED VIEW trending_products_sliding AS
SELECT
    product_id,
    COUNT(*)         AS purchase_count,
    WINDOW_START,
    WINDOW_END
FROM HOP(
    purchase_events,
    event_time,
    INTERVAL '10 MINUTES',
    INTERVAL '4 HOURS'
)
GROUP BY product_id, WINDOW_START, WINDOW_END;

Your recommendation service queries trending_products_sliding and gets current results with sub-second latency, always reflecting purchases from the last four hours.

Signal 2: Co-Purchase Affinity (What Is Bought Together)

Co-purchase affinity requires pairing products that appear in the same order. This is a self-join on the purchase events grouped by order_id:

CREATE MATERIALIZED VIEW co_purchase_affinity AS
SELECT
    a.product_id                        AS product_a,
    b.product_id                        AS product_b,
    COUNT(DISTINCT a.order_id)          AS co_purchase_count
FROM purchase_events a
JOIN purchase_events b
    ON a.order_id = b.order_id
    AND a.product_id < b.product_id  -- avoid duplicate pairs
GROUP BY a.product_id, b.product_id;

This view updates incrementally. When a new order arrives containing products P1 and P2, RisingWave increments the co_purchase_count for the (P1, P2) pair without rescanning historical data.

To serve "frequently bought together" recommendations:

SELECT
    product_b  AS recommended_product_id,
    co_purchase_count
FROM co_purchase_affinity
WHERE product_a = 'SKU-88412'
ORDER BY co_purchase_count DESC
LIMIT 5;

Signal 3: User Affinity Scores

Track each user's purchase behavior by category and brand to surface personalized recommendations:

CREATE MATERIALIZED VIEW user_category_affinity AS
SELECT
    user_id,
    category_id,
    COUNT(*)                           AS purchase_count,
    SUM(price)                         AS total_spend,
    MAX(event_time)                    AS last_purchase_at
FROM purchase_events
GROUP BY user_id, category_id;

CREATE MATERIALIZED VIEW user_brand_affinity AS
SELECT
    user_id,
    brand_id,
    COUNT(*)                           AS purchase_count,
    MAX(event_time)                    AS last_purchase_at
FROM purchase_events
GROUP BY user_id, brand_id;

When a user loads their homepage, query these views to determine which categories and brands to feature:

-- Top 3 categories for a specific user
SELECT category_id, purchase_count
FROM user_category_affinity
WHERE user_id = 'user_29847'
ORDER BY purchase_count DESC
LIMIT 3;

Because these views are incrementally maintained, they reflect the user's most recent purchase immediately — not after the next batch run.

Signal 4: Session Co-Views ("Customers Also Viewed")

Co-views within a session are a powerful signal for "customers also viewed" carousels. Products viewed in the same session tend to be substitutes or complements:

CREATE MATERIALIZED VIEW session_coview_affinity AS
SELECT
    a.product_id                     AS product_a,
    b.product_id                     AS product_b,
    COUNT(DISTINCT a.session_id)     AS co_view_count
FROM view_events a
JOIN view_events b
    ON a.session_id = b.session_id
    AND a.product_id < b.product_id
GROUP BY a.product_id, b.product_id;

This is the same pattern as co-purchase affinity but uses session-level co-views rather than order-level co-purchases. Co-view signals update in real time as visitors browse.

Combining Signals into a Recommendation Score

A simple linear combination of signals gives you a blended recommendation score without needing a trained model:

CREATE MATERIALIZED VIEW blended_recommendations AS
SELECT
    cp.product_a                            AS anchor_product,
    cp.product_b                            AS recommended_product,
    (cp.co_purchase_count * 3.0
     + COALESCE(cv.co_view_count, 0) * 1.0
    )                                       AS recommendation_score
FROM co_purchase_affinity cp
LEFT JOIN session_coview_affinity cv
    ON cp.product_a = cv.product_a
    AND cp.product_b = cv.product_b
ORDER BY anchor_product, recommendation_score DESC;

Adjust the weights (3.0 for purchase, 1.0 for view) based on your conversion data. The entire recommendation signal set updates continuously, with no model training loop.

Comparison: Batch vs. Streaming Recommendation Signals

Dimension	Nightly Batch ML	Hourly Refresh	Streaming SQL (RisingWave)
Signal freshness	12–24 hours	1 hour	Seconds
Captures today's trends	No	Partial	Yes
Reflects user's latest purchase	Next day	Within 1 hour	Immediately
Infrastructure required	ML platform, Spark/Airflow	Data warehouse + ETL	Streaming database
Team required	Data science + ML ops	Data engineering	Any SQL engineer
Recommendation logic in	Python/notebook	SQL + scheduler	SQL only

Serving Recommendations to Your Application

Because RisingWave exposes a PostgreSQL wire protocol, your application queries recommendation signals the same way it queries any database:

-- Fetch "frequently bought together" for a product page
SELECT r.recommended_product, p.name, p.price, r.recommendation_score
FROM blended_recommendations r
JOIN product_catalog p ON r.recommended_product = p.product_id
WHERE r.anchor_product = $1
ORDER BY r.recommendation_score DESC
LIMIT 8;

Latency is sub-millisecond because the view is pre-computed and indexed. This is read performance you cannot get from a streaming job that computes on the fly.

What Streaming SQL Cannot Replace

Streaming SQL recommendation signals handle the cold-hard business logic well — what is selling, what people buy together, what this user has shown interest in. It does not replace deep learning models that understand semantic product similarity (a user who bought hiking boots might want trail mix, even if those two products never appeared in the same order).

For most mid-market e-commerce operations, the streaming SQL signals outperform complex ML models in practice because they reflect current behavior. Save the ML investment for semantic product understanding and ranking refinement, not for signals that SQL can compute more reliably in real time.

Frequently Asked Questions

Q: Does this approach require any machine learning? No. The signals described here — co-purchases, trending products, user affinity — are aggregate statistics maintained with SQL GROUP BY operations. No model training is required. The signals are interpretable and auditable by anyone who reads SQL.

Q: How does this handle a new product with no purchase history? New products will not appear in co-purchase or co-view signals until they accumulate interactions. The standard solution is to surface new products in a "new arrivals" slot powered by the trending_products_4h view, which will pick them up as soon as they start selling.

Q: What if two users simultaneously buy the same product? Will the co-purchase view double-count? No. Each order event has a unique order_id. The COUNT(DISTINCT order_id) in the co-purchase view counts distinct orders, not raw event counts. Concurrent purchases from different orders correctly increment the co-purchase count once per order.

Q: How many products can this scale to? The co-purchase affinity materialized view grows quadratically with SKU count in the worst case (every product paired with every other). In practice, most products are bought together with a sparse subset of the catalog. RisingWave handles millions of product pairs without issue; beyond that, you can prune the view to retain only pairs with co_purchase_count > N.

Q: Can I add a decay function so older co-purchases matter less? Yes. Replace the simple count with a time-weighted sum using a sliding window. Use a HOP window on purchase events so that only purchases within the last 90 days contribute to the affinity score. As older purchases age out of the window, their contribution to the score drops automatically.

Building a Real-Time Recommendation Engine Without a Data Science Team

Building a Real-Time Recommendation Engine Without a Data Science Team

The Problem with Batch Recommendations

What You Actually Need for Basic Recommendations

Setting Up Event Sources in RisingWave

Signal 1: Trending Products (Rolling Window)

Signal 2: Co-Purchase Affinity (What Is Bought Together)

Signal 3: User Affinity Scores

Signal 4: Session Co-Views ("Customers Also Viewed")

Combining Signals into a Recommendation Score

Comparison: Batch vs. Streaming Recommendation Signals

Serving Recommendations to Your Application

What Streaming SQL Cannot Replace

Frequently Asked Questions