Real-Time Features for Recommendation Ranking: Beyond Collaborative Filtering

Real-Time Features for Recommendation Ranking: Beyond Collaborative Filtering

Collaborative filtering is one of the most durable ideas in machine learning. User-item interaction matrices, matrix factorization, two-tower models built on implicit feedback -- these approaches work because they encode stable long-term preferences. If you bought hiking boots six months ago and rated them five stars, that signal still matters when you return to the site.

But long-term preference is not the same as current intent. You searched for "camping gear" three minutes ago, clicked on two tent listings, and added a sleeping bag to your cart. The ranking model should be surfacing tents and camp cookware right now -- not the running shoes your CF embeddings think you like based on last winter's activity.

This gap is where real-time features earn their place. This article is for ML engineers building or improving ranking pipelines. The goal is not to replace offline embeddings -- they still carry most of the weight in most models. The goal is to add session-level signals that CF structurally cannot provide, computed fresh enough to matter.

Two Types of Recommendation Signals

Every practical ranking model mixes two fundamentally different kinds of signals.

Long-term preference is what collaborative filtering captures well. It reflects the accumulated history of a user's interactions: what they bought, what they rated, how long they spent on product pages. Matrix factorization and two-tower architectures distill this into dense embedding vectors. These embeddings are powerful because they generalize -- a user who consistently buys technical outdoor gear has a latent representation that transfers across product categories they have not yet seen.

The limitation is temporal. These embeddings are typically trained on data that is days or weeks old. Even with daily retraining, the model does not know what the user was looking at an hour ago.

Short-term intent is what the user is signaling right now. It lives in the session: the search query they typed, the categories they browsed, the items they compared. This signal is highly predictive for the immediate request but decays quickly. A user comparing tents is probably still comparing tents in five minutes. They are probably not still comparing tents next week.

The practical implication: short-term intent should dominate ranking for in-session requests (search results, related items, cart recommendations). Long-term preference should dominate for return-visit personalization (home feed after a two-week absence).

Most ranking models handle long-term preference well. The gap is short-term intent -- and specifically, computing those features reliably and freshly at serving time.

Key Real-Time Features for Ranking

Here are the features that move the needle for in-session ranking. Each one is straightforward to compute; the challenge is making the computation available at inference latency.

Session Click Sequence

The ordered sequence of items a user clicked in the current session is the strongest short-term signal available. It encodes both category interest (what they are shopping for) and price/quality tier (where in the search results they are clicking). A user who clicked three items in the $50-80 range is not the same as a user who clicked three items in the $200+ range, even if their long-term preference embeddings are similar.

For use in a ranking model, you typically want: the last N categories visited, the last N item IDs visited, and aggregate counts by category within the session.

Recent Search Queries

Search queries are explicit intent signals. When a user searches for "waterproof hiking boots size 10," they have told you more about their immediate need than any behavioral model can infer. Capturing these queries and matching them against item attributes at ranking time allows the model to boost items that directly satisfy the stated need.

This is especially useful for navigational searches -- users who know what they want and are trying to find it.

Trending at the platform level is a weak signal. What matters is trending within the user's active context: the category they are browsing, the price range they have been clicking in, or the geographic region (if inventory or delivery speed is relevant).

A product that went from 50 to 500 views in the last hour has a different demand signal than one that has been at 500 views for three days. The conversion rate trend matters more than the raw view count.

Inventory Pressure

Low-inventory items that a user has viewed or that match their intent create urgency. This is not just a merchandising trick -- it is a genuine relevance signal. If a user is comparing sleeping bags and the one that matches their price and rating preferences has two units left, that item deserves a ranking boost for that user specifically.

This feature requires near-real-time inventory data joined to user session context.

Data Sources

To compute these features, you need events flowing into the system continuously:

  • Click events: user ID, session ID, item ID, item category, item price, timestamp
  • Search events: user ID, session ID, query string, result set (or top N item IDs), timestamp
  • Cart events: user ID, session ID, item ID, action (add/remove), timestamp
  • Purchase events: user ID, order ID, item IDs, timestamp
  • Inventory events: item ID, warehouse ID, stock count, timestamp

These events typically come from a message bus (Kafka is common). The key requirement is that they arrive with low latency and are processed in event-time order, not ingestion-time order.

SQL Implementation

Streaming SQL materialized views are a practical way to maintain these features without building a dedicated feature computation service. The views stay current as events arrive; the ranking service queries them like a regular database table.

Session Activity View

-- Session-level user intent (last 30 min activity)
CREATE MATERIALIZED VIEW user_session_intent AS
SELECT
    user_id,
    session_id,
    array_agg(item_category ORDER BY event_time DESC) AS recent_categories,
    array_agg(item_id ORDER BY event_time DESC) AS recent_items,
    COUNT(*) AS session_event_count,
    MAX(event_time) AS last_activity
FROM user_events
WHERE event_type IN ('click', 'view', 'add_to_cart')
  AND event_time >= NOW() - INTERVAL '30 minutes'
GROUP BY user_id, session_id;

This view gives you the recent category and item sequence for each active session. The 30-minute window is intentionally short -- it captures current intent without diluting it with activity from earlier in the day.

At serving time, the ranking service queries this view by user_id and session_id and uses the resulting arrays to boost items in the same categories as recent_categories[1] (the most recent category visited).

-- Item trending score (real-time popularity)
CREATE MATERIALIZED VIEW item_trending AS
SELECT
    item_id,
    item_category,
    COUNT(*) AS views_1h,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases_1h,
    COUNT(*) FILTER (WHERE event_type = 'purchase')::float /
        NULLIF(COUNT(*), 0) AS conversion_rate_1h
FROM user_events
WHERE event_time >= NOW() - INTERVAL '1 hour'
GROUP BY item_id, item_category;

The conversion rate computed here is the real-time conversion rate, not the historical average. An item with a historical conversion rate of 3% that is currently converting at 8% is a different item from the model's perspective -- something has changed (a price drop, a viral mention, low inventory creating urgency). The real-time rate captures this.

User-Item Affinity View

For users with sufficient session history, you can compute a soft affinity score that blends session recency with interaction strength:

CREATE MATERIALIZED VIEW user_item_affinity AS
SELECT
    user_id,
    item_category,
    SUM(
        CASE event_type
            WHEN 'purchase'    THEN 10
            WHEN 'add_to_cart' THEN 5
            WHEN 'click'       THEN 1
            ELSE 0
        END
        * EXP(-0.1 * EXTRACT(EPOCH FROM (NOW() - event_time)) / 3600)
    ) AS affinity_score,
    MAX(event_time) AS last_interaction
FROM user_events
WHERE event_time >= NOW() - INTERVAL '7 days'
GROUP BY user_id, item_category;

The exponential decay factor (EXP(-0.1 * ...)) weights recent interactions more heavily. A purchase 10 minutes ago has a much stronger effect than a click six days ago. The 7-day window balances recency with enough history to be useful for less active users.

Joining Real-Time Features with Offline Embeddings at Serving Time

The architecture most production ranking systems use looks like this:

  1. Offline model training: a learning-to-rank model (LambdaMART, a neural ranker, or similar) trained on historical features plus labels (clicks, purchases). Real-time features are included as training features using point-in-time-correct snapshots.

  2. Candidate generation: a retrieval step (ANN search over CF embeddings, or a two-tower model) that produces a candidate set of 100-1000 items from a catalog of millions.

  3. Feature assembly at serving time: for each candidate, assemble the full feature vector. This includes:

    • Offline features: user embedding, item embedding, user-item similarity score (from the candidate generation step)
    • Real-time features: queried from the streaming SQL views
  4. Ranking: the assembled feature vectors are scored by the ranking model. The top K results are returned.

The join at step 3 is typically done in the serving layer. The ranking service fetches the user's current session intent from user_session_intent, the trending scores from item_trending, and assembles a feature row per candidate item. This assembly adds latency, so the views need to be queryable at single-digit milliseconds.

One practical pattern: precompute per-user feature rows asynchronously and cache them in a low-latency store (Redis or a similar KV store). The streaming SQL system writes to this cache as features update. The serving layer reads from the cache, not directly from the SQL system. This decouples serving latency from feature computation latency.

Feature Freshness SLAs by Recommendation Context

Not all recommendation surfaces have the same freshness requirement. Applying the same SLA everywhere wastes engineering effort.

ContextAcceptable stalenessReason
Home feed (logged-in, return visit)5-15 minutesUser intent has shifted since last session; session features are less relevant
Category browse1-2 minutesUser is actively exploring; recent click sequence matters
Search resultsUnder 1 secondQuery is an explicit intent signal; features must reflect current session
Related items (PDP)30-60 secondsUser is in a comparison mode; session recency matters but not at sub-second
Cart recommendations10-30 secondsCart contents are the strongest signal; should reflect current cart state
Email / push (batch)Hours acceptableUser is not active; historical features dominate

The implication for infrastructure: not all features need to be maintained at the same update frequency. Trending scores computed over a 1-hour window can tolerate 60-second refresh intervals with negligible quality impact. Session click sequences for search results need to be current at the time of the query.

Streaming materialized views handle this naturally -- they update as events arrive, so the freshness is determined by event latency and processing lag, not by a scheduled refresh interval.

Measuring Feature Impact on Ranking Metrics

Adding real-time features is not automatically beneficial. They need to be validated before deployment.

Offline evaluation should compare NDCG (Normalized Discounted Cumulative Gain) with and without real-time features, using point-in-time-correct feature snapshots. The key word is point-in-time-correct: you need the feature values as they existed at the time of each training example, not the current feature values. Using a streaming system that supports time-travel queries or event-time windowing makes this feasible.

A useful breakdown: evaluate NDCG separately for early-session queries (first 1-2 interactions) and late-session queries (5+ interactions). Real-time features typically show larger NDCG lift on late-session queries, where there is more session context to exploit. If you see no lift on late-session queries, the features are probably not being used correctly by the model.

Online evaluation should measure CTR (Click-Through Rate) and, where possible, conversion rate and session depth (number of items viewed per session). Real-time session features tend to show their impact most clearly in CTR on search result pages and related-item carousels -- contexts where the user's current intent is explicit.

A reasonable A/B test design: split traffic on session ID, not user ID. Session-level splitting avoids contamination effects where the same user receives different feature treatment across sessions in the same experiment.

Watch for session length effects. If real-time features are working correctly, users should find relevant items faster -- which may reduce session length. This can be misinterpreted as engagement decline if you are using session length as a proxy for engagement quality.

FAQ

Won't CF embeddings already capture some of this? My model trains daily.

Daily training on historical data does capture long-term preference well. It does not capture within-session behavior. A user who starts a new shopping session for camping gear will have CF embeddings shaped by their entire history -- which may include lots of non-camping purchases. The session features specifically capture what the user is doing right now, which the daily-trained model cannot know.

How do I include real-time features in model training if they are ephemeral?

You need to log feature values at the time of each impression, not recompute them at training time. Add a feature logging step to your serving pipeline that records the real-time feature values alongside each recommendation impression. These logged values become training labels when the user clicks or purchases. This is the standard approach for any feature that changes over time -- log at serving, join at training.

What about new users with no session history?

Session features for new users will be sparse or empty. Handle this with fallbacks: use platform-level trending features (available to all users) and item-level popularity scores. For search-result ranking specifically, the query itself is often sufficient to provide good results without session context. The session features become more valuable after the user has made 3-5 interactions in the session.

Isn't this just building a feature store? That's a lot of infrastructure.

A dedicated feature store (Feast, Tecton, Hopsworks) adds operational overhead: a separate system to deploy, monitor, and maintain. Streaming SQL materialized views give you a significant subset of feature store functionality -- specifically the real-time computation and serving parts -- using infrastructure you may already have (a streaming SQL engine connected to Kafka). For teams that do not already have a feature store, this is a practical starting point. Teams with an existing feature store can use streaming SQL as the computation layer that feeds it.

How many real-time features actually matter?

In practice, three to five features typically explain most of the incremental lift you will see from real-time signals: current session category sequence, recent search query, cart contents, item trending score in the user's active category, and time-since-last-interaction. More features than this tend to suffer from feature correlation and may degrade out-of-distribution performance. Start minimal, measure carefully, add features one at a time.


Real-time features are not a replacement for collaborative filtering -- they are a complement. CF encodes who a user is over time. Session features encode what a user wants right now. Both matter, and ranking models that use both consistently outperform models that rely on either alone. The engineering challenge is not building the features; it is making them available at serving time with the freshness and latency that each recommendation context requires.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.