Real-Time Feature Engineering for Recs

Real-Time Feature Engineering for Recs

Your recommendation model is only as good as the features it sees at inference time. And right now, most recommendation systems are looking at a version of the user that is hours old.

A batch pipeline runs overnight, computes features like "items this user clicked in the past 7 days" and "trending products by category," writes them to a feature store, and the model reads from that store at serving time. A user who just spent 10 minutes browsing running shoes will still see recommendations based on the hiking boots they looked at yesterday. An item that went viral 20 minutes ago will not surface as trending until the next batch run.

This freshness gap is where recommendation quality silently degrades. The model is not broken – it is simply making decisions with stale inputs.

This article shows you how to build real-time recommendation features using streaming SQL in RisingWave, a PostgreSQL-compatible streaming database. You will compute session-level behavior features, item popularity with time decay, user-item interaction counts, and collaborative filtering signals – all as materialized views that update continuously as events arrive.

Why Do Batch Features Hurt Recommendations?

Recommendation systems rely on three categories of features: user features (preferences, history, demographics), item features (popularity, attributes, freshness), and interaction features (clicks, ratings, purchases between a specific user and item). All three categories are time-sensitive.

User context shifts within minutes. A user's intent changes throughout a session. They might start browsing electronics, pivot to kitchen gadgets after seeing a deal, then settle on searching for coffee makers. Batch features computed from last night's data cannot capture this session-level context.

Item popularity is a moving target. A product mentioned by an influencer can go from obscure to trending in minutes. A flash sale creates a spike in demand that disappears just as quickly. Batch pipelines that refresh hourly or daily will either miss these signals entirely or surface them long after the moment has passed.

Interaction recency matters for ranking. A user who just added an item to their cart should not see that item recommended again. A user who just dismissed a suggestion is telling you something. These signals have a half-life measured in seconds, not hours.

Streaming SQL solves these problems by computing features incrementally, event by event. A materialized view in RisingWave updates within milliseconds of each new click, view, or purchase – so your recommendation model always sees the freshest version of every feature.

Setting Up the Event Sources

Every recommendation feature pipeline starts with user interaction events. In production, these typically flow through Kafka from your application's event tracking layer.

Clickstream source

CREATE SOURCE user_interactions (
    event_id VARCHAR,
    user_id VARCHAR,
    item_id VARCHAR,
    event_type VARCHAR,        -- 'view', 'click', 'add_to_cart', 'purchase', 'dismiss'
    category VARCHAR,
    session_id VARCHAR,
    device_type VARCHAR,
    event_time TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'user.interactions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

Item catalog table

You also need item metadata to enrich interaction events. This table can be populated via CDC from your product database or through batch loads:

CREATE TABLE item_catalog (
    item_id VARCHAR PRIMARY KEY,
    title VARCHAR,
    category VARCHAR,
    subcategory VARCHAR,
    price DECIMAL,
    created_at TIMESTAMPTZ
);

With events flowing in and item metadata available for joins, you can now build the four categories of recommendation features.

Session-Level Behavior Features

Session features capture what a user is doing right now – the strongest signal for real-time recommendations. A user's in-session behavior reveals their current intent far better than their historical profile.

Active session summary

This materialized view computes a rolling summary of each user's current session:

CREATE MATERIALIZED VIEW session_features AS
SELECT
    session_id,
    user_id,
    COUNT(*) AS total_events,
    COUNT(*) FILTER (WHERE event_type = 'view') AS views_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS clicks_count,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_adds,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases,
    COUNT(DISTINCT item_id) AS distinct_items_viewed,
    COUNT(DISTINCT category) AS distinct_categories,
    MIN(event_time) AS session_start,
    MAX(event_time) AS last_activity,
    MAX(event_time) - MIN(event_time) AS session_duration
FROM user_interactions
GROUP BY session_id, user_id;

Your recommendation model can now query this view at inference time to answer questions like: "Has this user been browsing broadly (many categories) or deeply (many views in one category)?" Broad browsing suggests the user is in exploration mode, where diversity matters. Deep browsing suggests high purchase intent in a specific category, where precision matters.

Session category affinity

To make the session signal more actionable, compute a per-category breakdown of in-session behavior:

CREATE MATERIALIZED VIEW session_category_affinity AS
SELECT
    session_id,
    user_id,
    category,
    COUNT(*) AS interaction_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS clicks,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_adds,
    -- Weighted engagement score: views=1, clicks=3, cart=5, purchase=10
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 3
            WHEN 'add_to_cart' THEN 5
            WHEN 'purchase' THEN 10
            ELSE 0
        END
    ) AS engagement_score
FROM user_interactions
GROUP BY session_id, user_id, category;

At inference time, you query this view sorted by engagement_score to get the user's top categories in this session. Feed that as a feature vector to your ranking model, and recommendations will reflect what the user cares about right now – not what they cared about last week.

Item Popularity with Time Decay

Raw popularity counts ("most viewed items") are a blunt instrument. An item that received 10,000 views last month but none today is not trending. You need popularity scores that decay over time, giving more weight to recent interactions.

This materialized view computes a popularity score that applies exponential decay based on how recently each interaction occurred:

CREATE MATERIALIZED VIEW item_popularity_scores AS
SELECT
    item_id,
    category,
    COUNT(*) AS total_interactions,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    -- Time-decayed popularity: recent events count more
    SUM(
        CASE event_type
            WHEN 'view' THEN 1.0
            WHEN 'click' THEN 2.0
            WHEN 'add_to_cart' THEN 3.0
            WHEN 'purchase' THEN 5.0
            ELSE 0.0
        END
        * EXP(-0.01 * EXTRACT(EPOCH FROM (NOW() - event_time)) / 3600.0)
    ) AS decay_weighted_score,
    -- Velocity: interactions in last hour
    COUNT(*) FILTER (
        WHERE event_time > NOW() - INTERVAL '1 hour'
    ) AS interactions_last_hour,
    MAX(event_time) AS last_interaction
FROM user_interactions
GROUP BY item_id, category;

The EXP(-0.01 * hours_since_event) term means an interaction from one hour ago retains about 99% of its weight, while an interaction from 24 hours ago retains about 79%, and one from a week ago retains about 19%. You can tune the decay rate (0.01) to match how quickly trends shift in your domain.

For discovery-oriented recommendations (homepages, category landing pages), you also want to know which categories are trending:

CREATE MATERIALIZED VIEW category_trending AS
SELECT
    category,
    COUNT(DISTINCT item_id) AS active_items,
    COUNT(DISTINCT user_id) AS unique_users,
    COUNT(*) FILTER (
        WHERE event_time > NOW() - INTERVAL '1 hour'
    ) AS interactions_last_hour,
    COUNT(*) FILTER (
        WHERE event_time > NOW() - INTERVAL '24 hours'
    ) AS interactions_last_day,
    CASE
        WHEN COUNT(*) FILTER (WHERE event_time > NOW() - INTERVAL '24 hours') = 0 THEN 0
        ELSE ROUND(
            COUNT(*) FILTER (WHERE event_time > NOW() - INTERVAL '1 hour')::NUMERIC
            / (COUNT(*) FILTER (WHERE event_time > NOW() - INTERVAL '24 hours')::NUMERIC / 24.0),
            2
        )
    END AS velocity_ratio
FROM user_interactions
GROUP BY category;

A velocity_ratio above 1.0 means the category is getting more traffic than its daily average – a signal to boost items from that category in recommendation results.

User-Item Interaction Counts

Recommendation models need to know the history between a specific user and a specific item. Has this user seen this item before? How many times? Did they ever add it to their cart without purchasing? These interaction counts are essential for both candidate filtering and ranking.

Pairwise interaction history

CREATE MATERIALIZED VIEW user_item_interactions AS
SELECT
    user_id,
    item_id,
    COUNT(*) AS total_interactions,
    COUNT(*) FILTER (WHERE event_type = 'view') AS view_count,
    COUNT(*) FILTER (WHERE event_type = 'click') AS click_count,
    COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS cart_count,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    COUNT(*) FILTER (WHERE event_type = 'dismiss') AS dismiss_count,
    MIN(event_time) AS first_seen,
    MAX(event_time) AS last_seen,
    MAX(event_time) - MIN(event_time) AS consideration_time
FROM user_interactions
GROUP BY user_id, item_id;

This view is directly useful for ranking. If a user has viewed an item three times but never purchased it, that is a different signal than a first-time view. The consideration_time field helps distinguish impulse browsing from deliberate evaluation.

Suppression filter

One of the simplest ways to improve recommendations is to stop suggesting items the user has already bought or dismissed:

CREATE MATERIALIZED VIEW items_to_suppress AS
SELECT DISTINCT
    user_id,
    item_id
FROM user_interactions
WHERE event_type IN ('purchase', 'dismiss');

Your serving layer can join recommendation candidates against this view to filter out items that should not appear. This view updates the instant a purchase or dismissal happens, so the user never sees a just-purchased item recommended back to them.

Collaborative Signals

Collaborative filtering relies on finding patterns across users: "users who bought X also bought Y." Computing these signals in real time lets your recommendations reflect what is happening across your user base right now, not what happened in last night's batch.

Item co-occurrence matrix

This materialized view tracks which items are frequently interacted with together by the same user:

CREATE MATERIALIZED VIEW item_cooccurrence AS
SELECT
    a.item_id AS item_a,
    b.item_id AS item_b,
    COUNT(DISTINCT a.user_id) AS shared_users,
    COUNT(*) FILTER (WHERE a.event_type = 'purchase' AND b.event_type = 'purchase') AS co_purchases
FROM user_interactions a
JOIN user_interactions b
    ON a.user_id = b.user_id
    AND a.item_id < b.item_id
    AND a.event_time > NOW() - INTERVAL '7 days'
    AND b.event_time > NOW() - INTERVAL '7 days'
GROUP BY a.item_id, b.item_id
HAVING COUNT(DISTINCT a.user_id) >= 3;

The HAVING clause filters out noise by requiring at least three shared users. The a.item_id < b.item_id condition avoids counting each pair twice. The 7-day window keeps the signal fresh – co-occurrences from months ago naturally drop out.

User similarity for neighborhood-based recs

For "users like you also liked" recommendations, compute user similarity based on recent category preferences:

CREATE MATERIALIZED VIEW user_category_profile AS
SELECT
    user_id,
    category,
    COUNT(*) AS interaction_count,
    SUM(
        CASE event_type
            WHEN 'view' THEN 1
            WHEN 'click' THEN 3
            WHEN 'add_to_cart' THEN 5
            WHEN 'purchase' THEN 10
            ELSE 0
        END
    ) AS weighted_score
FROM user_interactions
WHERE event_time > NOW() - INTERVAL '30 days'
GROUP BY user_id, category;

Your serving layer can compare category distributions to find similar users. Because the materialized view updates continuously, a user who develops interest in a new category will be matched with users in that category within seconds.

Architecture: From Events to Recommendations

A production recommendation feature pipeline with RisingWave fits into three layers:

Ingestion layer. Your application publishes interaction events to Kafka. RisingWave consumes them through a CREATE SOURCE statement. Item catalog data flows in via CDC from your product database or a direct table load.

Feature computation layer. Materialized views define every feature: session summaries, item popularity with decay, user-item interaction counts, collaborative signals. Each view is maintained incrementally by RisingWave's streaming engine. When a new event arrives, only the affected rows are updated.

Serving layer. Your recommendation service queries features from RisingWave at inference time over a standard PostgreSQL connection. For models that need features in a specific format, you can also sink materialized views to a downstream feature store or cache:

CREATE SINK session_features_to_redis FROM session_features
WITH (
    connector = 'redis',
    primary_key = 'session_id',
    redis.url = 'redis://feature-cache:6379'
);

Or sink to PostgreSQL for models served through a traditional feature store:

CREATE SINK popularity_to_pg FROM item_popularity_scores
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://feature-db:5432/features',
    table.name = 'item_popularity_scores',
    type = 'upsert',
    primary_key = 'item_id'
);

This architecture replaces what typically requires Kafka + Flink + a feature store + a batch scheduler with Kafka + RisingWave. You write SQL instead of Flink jobs, and you eliminate the Airflow DAGs that orchestrate hourly feature refreshes.

FAQ

What are real-time recommendation features?

Real-time recommendation features are numerical signals – such as session click counts, time-decayed item popularity, and user-item interaction histories – that are computed continuously from live event streams rather than from periodic batch jobs. They capture user intent and item trends as they happen, enabling recommendation models to adapt within seconds of each user action.

How does time-decayed popularity differ from raw counts?

Raw counts treat an interaction from last week the same as one from five minutes ago. Time-decayed popularity applies an exponential decay function that reduces the weight of older interactions, so recently popular items score higher than items that were popular days ago. This prevents stale "all-time bestsellers" from dominating recommendation slots when fresher, more relevant items are trending.

Can streaming SQL handle collaborative filtering?

Streaming SQL can efficiently compute the building blocks of collaborative filtering – item co-occurrence matrices, user category profiles, and pairwise interaction counts. These materialized views update incrementally as events arrive. The actual similarity computation or matrix factorization still happens in your ML training pipeline, but the feature inputs are always fresh.

How does this approach scale with millions of users?

RisingWave distributes materialized view computation across multiple compute nodes. Each GROUP BY user_id query is partitioned by user, so adding more nodes increases throughput linearly. State is persisted to S3-compatible storage with automatic checkpointing, so node failures do not cause data loss or require reprocessing from scratch.

Do I need to retrain my model to use real-time features?

Not necessarily. If your model already uses features like "number of user clicks on this item" or "item popularity score," switching from batch-computed to streaming-computed versions improves serving freshness without changing the model architecture. The model sees the same feature schema – just with fresher values.

Conclusion

Real-time feature engineering transforms recommendation systems from reactive to adaptive. Here are the key takeaways:

  • Session features capture current intent. Computing in-session behavior (categories browsed, items clicked, engagement depth) as materialized views gives your model the strongest signal for what a user wants right now.
  • Time-decayed popularity surfaces what is actually trending. Exponential decay ensures that recently popular items rank above historically popular ones, making trend-sensitive recommendations possible.
  • User-item interaction counts enable smart filtering. Tracking views, cart additions, purchases, and dismissals per user-item pair lets you suppress already-purchased items and identify high-intent candidates – all in real time.
  • Collaborative signals stay fresh. Item co-occurrence and user similarity profiles update continuously, so "users like you" recommendations reflect current behavior patterns, not last week's snapshot.
  • SQL is all you need. Every feature in this article is a standard SQL query maintained as a materialized view. No Java, no Scala, no custom stream processing code.

Ready to build real-time recommendation features? Sign up for RisingWave Cloud – free, no credit card required – and start computing features in minutes. Or follow the Quickstart to run RisingWave locally.

Join our Slack community to ask questions and share what you are building.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.