Real-Time Feature Engineering for Recommendations

Real-Time Feature Engineering for Recommendations

Your recommendation model is only as good as the features it sees at inference time. And right now, most recommendation systems are looking at a version of the user that is hours old.

A batch pipeline runs overnight, computes features like "items this user clicked in the past 7 days" and "trending products by category," writes them to a feature store, and the model reads from that store at serving time. A user who just spent 10 minutes browsing running shoes will still see recommendations based on the hiking boots they looked at yesterday. An item that went viral 20 minutes ago will not surface as trending until the next batch run.

This freshness gap is where recommendation quality silently degrades. The model is not broken -- it is simply making decisions with stale inputs.

This article shows you how to build real-time recommendation features using streaming SQL in RisingWave, a PostgreSQL-compatible streaming database. You will compute session-level behavior features, item popularity with time-windowed scoring, user-item interaction counts, and collaborative filtering signals -- all as materialized views that update continuously as events arrive.

Why Do Batch Features Hurt Recommendations?

Recommendation systems rely on three categories of features: user features (preferences, history, demographics), item features (popularity, attributes, freshness), and interaction features (clicks, ratings, purchases between a specific user and item). All three categories are time-sensitive.

User context shifts within minutes. A user's intent changes throughout a session. They might start browsing electronics, pivot to kitchen gadgets after seeing a deal, then settle on searching for coffee makers. Batch features computed from last night's data cannot capture this session-level context.

Item popularity is a moving target. A product mentioned by an influencer can go from obscure to trending in minutes. A flash sale creates a spike in demand that disappears just as quickly. Batch pipelines that refresh hourly or daily will either miss these signals entirely or surface them long after the moment has passed.

Interaction recency matters for ranking. A user who just added an item to their cart should not see that item recommended again. A user who just dismissed a suggestion is telling you something. These signals have a half-life measured in seconds, not hours.

Streaming SQL solves these problems by computing features incrementally, event by event. A materialized view in RisingWave updates within milliseconds of each new click, view, or purchase -- so your recommendation model always sees the freshest version of every feature.

Setting Up the Event Sources

Every recommendation feature pipeline starts with user interaction events. In production, these typically flow through Kafka from your application's event tracking layer.

Clickstream source

CREATE SOURCE user_interactions (
    event_id VARCHAR,
    user_id VARCHAR,
    item_id VARCHAR,
    event_type VARCHAR,        -- 'view', 'click', 'add_to_cart', 'purchase', 'dismiss'
    category VARCHAR,
    session_id VARCHAR,
    device_type VARCHAR,
    event_time TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'user.interactions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Item catalog table

You also need item metadata to enrich interaction events. This table can be populated via CDC from your product database or through batch loads:

CREATE TABLE item_catalog (
    item_id VARCHAR PRIMARY KEY,
    title VARCHAR,
    category VARCHAR,
    subcategory VARCHAR,
    price DECIMAL,
    created_at TIMESTAMPTZ
);

For local testing without Kafka, use a table for interactions too:

CREATE TABLE user_interactions (
    event_id VARCHAR,
    user_id VARCHAR,
    item_id VARCHAR,
    event_type VARCHAR,
    category VARCHAR,
    session_id VARCHAR,
    device_type VARCHAR,
    event_time TIMESTAMPTZ
);

With events flowing in and item metadata available for joins, you can now build the four categories of recommendation features.

Session-Level Behavior Features

Session features capture what a user is doing right now -- the strongest signal for real-time recommendations. A user's in-session behavior reveals their current intent far better than their historical profile.

Active session summary

This materialized view computes a rolling summary of each user's current session:

CREATE MATERIALIZED VIEW session_features AS
SELECT
    session_id,
    user_id,
    COUNT(*) AS total_events,
    COUNT(*) FILTER (WHERE event_type = 'view') AS views_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS clicks_count,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_adds,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases,
    COUNT(DISTINCT item_id) AS distinct_items_viewed,
    COUNT(DISTINCT category) AS distinct_categories,
    MIN(event_time) AS session_start,
    MAX(event_time) AS last_activity,
    MAX(event_time) - MIN(event_time) AS session_duration
FROM user_interactions
GROUP BY session_id, user_id;

Your recommendation model can now query this view at inference time to answer questions like: "Has this user been browsing broadly (many categories) or deeply (many views in one category)?" Broad browsing suggests the user is in exploration mode, where diversity matters. Deep browsing suggests high purchase intent in a specific category, where precision matters.

Session category affinity

To make the session signal more actionable, compute a per-category breakdown of in-session behavior:

CREATE MATERIALIZED VIEW session_category_affinity AS
SELECT
    session_id,
    user_id,
    category,
    COUNT(*) AS interaction_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS clicks,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_adds,
    -- Weighted engagement score: views=1, clicks=3, cart=5, purchase=10
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 3
            WHEN 'add_to_cart' THEN 5
            WHEN 'purchase' THEN 10
            ELSE 0
        END
    ) AS engagement_score
FROM user_interactions
GROUP BY session_id, user_id, category;

At inference time, you query this view sorted by engagement_score to get the user's top categories in this session. Feed that as a feature vector to your ranking model, and recommendations will reflect what the user cares about right now -- not what they cared about last week.

Item Popularity with Time Windows

Raw popularity counts ("most viewed items") are a blunt instrument. An item that received 10,000 views last month but none today is not trending. You need popularity scores that emphasize recent activity.

In RisingWave, the NOW() function is only allowed in WHERE, HAVING, and ON clauses for streaming queries. Instead of computing exponential decay in a SELECT expression, you create time-windowed materialized views where the WHERE clause acts as a temporal filter -- automatically adding new events and removing expired ones:

CREATE MATERIALIZED VIEW item_popularity_1h AS
SELECT
    item_id,
    category,
    COUNT(*) AS interactions_1h,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases_1h,
    COUNT(DISTINCT user_id) AS unique_users_1h,
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 2
            WHEN 'add_to_cart' THEN 3
            WHEN 'purchase' THEN 5
            ELSE 0
        END
    ) AS weighted_score_1h,
    MAX(event_time) AS last_interaction
FROM user_interactions
WHERE event_time > NOW() - INTERVAL '1 hour'
GROUP BY item_id, category;

Broader popularity (last 24 hours)

CREATE MATERIALIZED VIEW item_popularity_24h AS
SELECT
    item_id,
    category,
    COUNT(*) AS interactions_24h,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases_24h,
    COUNT(DISTINCT user_id) AS unique_users_24h,
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 2
            WHEN 'add_to_cart' THEN 3
            WHEN 'purchase' THEN 5
            ELSE 0
        END
    ) AS weighted_score_24h
FROM user_interactions
WHERE event_time > NOW() - INTERVAL '24 hours'
GROUP BY item_id, category;

By comparing weighted_score_1h against weighted_score_24h / 24, you can compute a velocity ratio at query time to detect items that are trending above their daily average. Items with a ratio above 1.0 are gaining momentum; items below 1.0 are cooling off.

For discovery-oriented recommendations (homepages, category landing pages), you also want to know which categories are trending:

CREATE MATERIALIZED VIEW category_trending AS
SELECT
    category,
    COUNT(DISTINCT item_id) AS active_items,
    COUNT(DISTINCT user_id) AS unique_users,
    COUNT(*) AS interactions_1h,
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 2
            WHEN 'add_to_cart' THEN 3
            WHEN 'purchase' THEN 5
            ELSE 0
        END
    ) AS weighted_score_1h
FROM user_interactions
WHERE event_time > NOW() - INTERVAL '1 hour'
GROUP BY category;

User-Item Interaction Counts

Recommendation models need to know the history between a specific user and a specific item. Has this user seen this item before? How many times? Did they ever add it to their cart without purchasing? These interaction counts are essential for both candidate filtering and ranking.

Pairwise interaction history

CREATE MATERIALIZED VIEW user_item_interactions AS
SELECT
    user_id,
    item_id,
    COUNT(*) AS total_interactions,
    COUNT(*) FILTER (WHERE event_type = 'view') AS view_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS click_count,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_count,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    COUNT(*) FILTER (WHERE event_type = 'dismiss') AS dismiss_count,
    MIN(event_time) AS first_seen,
    MAX(event_time) AS last_seen,
    MAX(event_time) - MIN(event_time) AS consideration_time
FROM user_interactions
GROUP BY user_id, item_id;

This view is directly useful for ranking. If a user has viewed an item three times but never purchased it, that is a different signal than a first-time view. The consideration_time field helps distinguish impulse browsing from deliberate evaluation.

Suppression filter

One of the simplest ways to improve recommendations is to stop suggesting items the user has already bought or dismissed:

CREATE MATERIALIZED VIEW items_to_suppress AS
SELECT DISTINCT
    user_id,
    item_id
FROM user_interactions
WHERE event_type IN ('purchase', 'dismiss');

Your serving layer can join recommendation candidates against this view to filter out items that should not appear. This view updates the instant a purchase or dismissal happens, so the user never sees a just-purchased item recommended back to them.

Collaborative Signals

Collaborative filtering relies on finding patterns across users: "users who bought X also bought Y." Computing these signals in real time lets your recommendations reflect what is happening across your user base right now, not what happened in last night's batch.

Item co-occurrence matrix

This materialized view tracks which items are frequently interacted with together by the same user:

CREATE MATERIALIZED VIEW item_cooccurrence AS
SELECT
    a.item_id AS item_a,
    b.item_id AS item_b,
    COUNT(DISTINCT a.user_id) AS shared_users,
    COUNT(*) FILTER (
        WHERE a.event_type = 'purchase' AND b.event_type = 'purchase'
    ) AS co_purchases
FROM user_interactions a
JOIN user_interactions b
    ON a.user_id = b.user_id
    AND a.item_id < b.item_id
    AND a.event_time > NOW() - INTERVAL '7 days'
    AND b.event_time > NOW() - INTERVAL '7 days'
GROUP BY a.item_id, b.item_id
HAVING COUNT(DISTINCT a.user_id) >= 3;

The HAVING clause filters out noise by requiring at least three shared users. The a.item_id < b.item_id condition avoids counting each pair twice. The 7-day temporal filter in the ON clause keeps the signal fresh -- co-occurrences from months ago naturally drop out.

User similarity for neighborhood-based recs

For "users like you also liked" recommendations, compute user similarity based on recent category preferences:

CREATE MATERIALIZED VIEW user_category_profile AS
SELECT
    user_id,
    category,
    COUNT(*) AS interaction_count,
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 3
            WHEN 'add_to_cart' THEN 5
            WHEN 'purchase' THEN 10
            ELSE 0
        END
    ) AS weighted_score
FROM user_interactions
WHERE event_time > NOW() - INTERVAL '30 days'
GROUP BY user_id, category;

Your serving layer can compare category distributions to find similar users. Because the materialized view updates continuously, a user who develops interest in a new category will be matched with users in that category within seconds.

Architecture: From Events to Recommendations

A production recommendation feature pipeline with RisingWave fits into three layers:

Ingestion layer. Your application publishes interaction events to Kafka. RisingWave consumes them through a CREATE SOURCE statement. Item catalog data flows in via CDC from your product database or a direct table load.

Feature computation layer. Materialized views define every feature: session summaries, item popularity with time windows, user-item interaction counts, collaborative signals. Each view is maintained incrementally by RisingWave's streaming engine. When a new event arrives, only the affected rows are updated.

Serving layer. Your recommendation service queries features from RisingWave at inference time over a standard PostgreSQL connection. For models that need features in a specific format, you can also sink materialized views to downstream systems:

CREATE SINK popularity_to_kafka FROM item_popularity_1h
WITH (
    connector = 'kafka',
    topic = 'recommendation.item-popularity',
    properties.bootstrap.server = 'broker:9092',
    type = 'upsert',
    primary_key = 'item_id'
)
FORMAT UPSERT ENCODE JSON;

This architecture replaces what typically requires Kafka + Flink + a feature store + a batch scheduler with Kafka + RisingWave. You write SQL instead of Flink jobs, and you eliminate the Airflow DAGs that orchestrate hourly feature refreshes.

FAQ

What are real-time recommendation features?

Real-time recommendation features are numerical signals -- such as session click counts, time-windowed item popularity, and user-item interaction histories -- that are computed continuously from live event streams rather than from periodic batch jobs. They capture user intent and item trends as they happen, enabling recommendation models to adapt within seconds of each user action.

How do time-windowed popularity scores differ from raw counts?

Raw counts treat an interaction from last week the same as one from five minutes ago. Time-windowed popularity uses temporal filters (e.g., WHERE event_time > NOW() - INTERVAL '1 hour') so that only recent interactions contribute to the score. Items that were popular days ago naturally drop out of the window, ensuring that trending items reflect current demand rather than historical accumulation.

Can streaming SQL handle collaborative filtering?

Streaming SQL can efficiently compute the building blocks of collaborative filtering -- item co-occurrence matrices, user category profiles, and pairwise interaction counts. These materialized views update incrementally as events arrive. The actual similarity computation or matrix factorization still happens in your ML training pipeline, but the feature inputs are always fresh.

How does this approach scale with millions of users?

RisingWave distributes materialized view computation across multiple compute nodes. Each GROUP BY user_id query is partitioned by user, so adding more nodes increases throughput linearly. State is persisted to S3-compatible storage with automatic checkpointing, so node failures do not cause data loss or require reprocessing from scratch.

Do I need to retrain my model to use real-time features?

Not necessarily. If your model already uses features like "number of user clicks on this item" or "item popularity score," switching from batch-computed to streaming-computed versions improves serving freshness without changing the model architecture. The model sees the same feature schema -- just with fresher values.

Conclusion

Real-time feature engineering transforms recommendation systems from reactive to adaptive. Here are the key takeaways:

  • Session features capture current intent. Computing in-session behavior (categories browsed, items clicked, engagement depth) as materialized views gives your model the strongest signal for what a user wants right now.
  • Time-windowed popularity surfaces what is actually trending. Temporal filters ensure that recently popular items rank above historically popular ones, making trend-sensitive recommendations possible.
  • User-item interaction counts enable smart filtering. Tracking views, cart additions, purchases, and dismissals per user-item pair lets you suppress already-purchased items and identify high-intent candidates -- all in real time.
  • Collaborative signals stay fresh. Item co-occurrence and user similarity profiles update continuously, so "users like you" recommendations reflect current behavior patterns, not last week's snapshot.
  • SQL is all you need. Every feature in this article is a standard SQL query maintained as a materialized view. No Java, no Scala, no custom stream processing code.

Ready to build real-time recommendation features? Sign up for RisingWave Cloud -- free, no credit card required -- and start computing features in minutes. Or follow the Quickstart to run RisingWave locally.

Join our Slack community to ask questions and share what you are building.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.