Real-Time Personalization: Score User Preferences from Live Behavioral Data

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is real-time personalization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Real-time personalization means adapting the content, ranking, or recommendations a user sees based on their most recent behavior, not a batch-computed profile from hours or days ago. It requires continuously maintained user preference scores that update within seconds of each interaction."
      }
    },
    {
      "@type": "Question",
      "name": "How is user preference scoring done in streaming systems?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "User preference scores are maintained as materialized views in a streaming database like RisingWave. Weighted event scores (views, cart additions, purchases) accumulate per user-category or user-brand pair, and the view updates incrementally with each new event, typically within milliseconds."
      }
    },
    {
      "@type": "Question",
      "name": "What is behavioral personalization streaming?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Behavioral personalization streaming means computing affinity scores, session intent signals, and preference profiles directly from a live event stream, without waiting for a batch job to run. The result is a preference model that reflects what the user did moments ago, not what they did last week."
      }
    },
    {
      "@type": "Question",
      "name": "How do you balance session signals with long-term user history?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A common approach is to weight the current session's behavioral signals more heavily than the long-term profile at ranking time. For example, giving 60% weight to the last 1-3 hours of activity and 40% to lifetime history. This lets the system respond quickly to in-session intent shifts without ignoring durable preferences."
      }
    },
    {
      "@type": "Question",
      "name": "How does streaming personalization compare to batch profile computation?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Batch profile computation runs on a schedule (typically nightly or hourly) and produces snapshots of user preferences that go stale immediately. Streaming personalization maintains profiles as live materialized views that update within milliseconds of each event, so the preference data is never more than a few seconds old."
      }
    }
  ]
}

Every December, Spotify Wrapped generates millions of social media posts. People share their most-played artists, their total listening minutes, their embarrassing top genres. What makes it land is not the novelty of the data. It is the accuracy. Spotify knew your most-played artist before December 1st. They tracked every play, computed running totals continuously, and had the answer ready before you even thought to ask.

Now contrast that with a typical e-commerce site. The "personalized for you" section on the homepage was probably computed last night in a batch job. It knows you bought hiking boots in March. It does not know you just spent the last 20 minutes browsing sustainable fashion. It does not know you added three linen shirts to your wishlist, viewed a dozen pairs of travel pants, and hovered over a brand you had never interacted with before. All of that signal was generated in the last half hour, and the batch job cannot see it yet.

This gap between when behavior happens and when the personalization system learns about it is the core problem this article addresses. We will build a streaming preference scoring pipeline using RisingWave, a PostgreSQL-compatible streaming database, that maintains continuously updating user preference profiles as materialized views. When a user acts, their profile updates within milliseconds. The very next page load reflects what they just did.

What a User Preference Profile Actually Is

Before writing any SQL, it helps to be precise about what we mean by a "preference profile." It is not a single score. It is a set of dimensional scores that together describe how a user engages with your product catalog.

Category affinity answers the question: which product categories does this user care about, and how much? A user who has purchased three times in Electronics and viewed twenty items there has stronger Electronics affinity than one who browsed it once.

Brand affinity captures loyalty signals. Some users are brand-agnostic and shop by price and review. Others consistently return to the same two or three brands. A user who has purchased from a brand four times is a brand loyalist, and surfacing that brand's new arrivals prominently makes sense.

Price sensitivity is more nuanced than the above two. It is not just about whether someone buys cheap or expensive items. It is about their conversion rate by price tier. A user might view dozens of luxury items but only purchase mid-range ones. That gap between browsing behavior and purchase behavior is the signal. It tells you what they aspire to versus what they actually buy, and it should inform what you show them when inventory is tight or when you want to drive conversion.

Session-level recency is the most underused dimension. What is the user interested in right now, in this browsing session? Long-term profile data is useful, but a user who has been clicking on running shoes for the last 30 minutes has declared a current intent that should outweigh their six-month history of browsing across categories.

All four of these dimensions need to be maintained continuously to be useful for real-time personalization.

Setting Up the Event Source

The raw material for preference scoring is a stream of user interaction events. Every view, cart addition, wishlist save, and purchase carries signal.

-- User event stream from Kafka
CREATE SOURCE user_events_source (
    event_id    TEXT,
    user_id     TEXT,
    item_id     TEXT,
    brand       TEXT,
    category    TEXT,
    price       DECIMAL,
    event_type  TEXT,
    session_id  TEXT,
    event_time  TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'user.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

CREATE TABLE user_events (
    event_id    TEXT,
    user_id     TEXT,
    item_id     TEXT,
    brand       TEXT,
    category    TEXT,
    price       DECIMAL,
    event_type  TEXT,
    session_id  TEXT,
    event_time  TIMESTAMPTZ
) FROM user_events_source;

The event_type field carries one of four values: 'view', 'add_to_cart', 'wishlist', or 'purchase'. Each represents a different level of intent, and we will assign weights accordingly throughout the pipeline.

The SQL Pipeline: Four Materialized Views

The preference scoring pipeline is four materialized views built on top of the event table. Each view captures a different dimension of user preferences, and all four update incrementally as events arrive.

Category Affinity

CREATE MATERIALIZED VIEW user_category_profile AS
SELECT
    user_id,
    category,
    SUM(CASE 
        WHEN event_type = 'purchase'    THEN 10
        WHEN event_type = 'wishlist'    THEN 7
        WHEN event_type = 'add_to_cart' THEN 5
        WHEN event_type = 'view'        THEN 1
        ELSE 0
    END) AS affinity_score,
    COUNT(DISTINCT item_id) AS unique_items_interacted,
    MAX(event_time) AS last_engaged_at
FROM user_events
GROUP BY user_id, category;

The weighted scoring scheme assigns 10 points for a purchase, 7 for a wishlist save, 5 for a cart addition, and 1 for a view. These weights encode the intent ladder: every purchase is ten times more informative than a casual view. You can tune the weights to match your domain. On a subscription platform, for example, you might weight trial starts heavily. On a content site, you might count scroll depth as a proxy for engagement.

unique_items_interacted tells you whether a user is deeply focused (high score, low unique items) or broadly browsing (high score spread across many items). This dimension matters for recommendation diversity.

Brand Affinity

CREATE MATERIALIZED VIEW user_brand_profile AS
SELECT
    user_id,
    brand,
    SUM(CASE 
        WHEN event_type = 'purchase'    THEN 10
        WHEN event_type = 'wishlist'    THEN 7
        WHEN event_type = 'add_to_cart' THEN 5
        WHEN event_type = 'view'        THEN 1
        ELSE 0
    END) AS brand_affinity,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    MAX(event_time) AS last_brand_interaction
FROM user_events
GROUP BY user_id, brand;

The purchase_count column is important here beyond the raw affinity score. A user might have a high affinity score for a brand simply from browsing without ever converting. Separating browsing affinity from purchase history lets the serving layer distinguish "user is interested in this brand" from "user is a loyal buyer of this brand."

Price Sensitivity

CREATE MATERIALIZED VIEW user_price_sensitivity AS
SELECT
    user_id,
    CASE
        WHEN price < 25  THEN 'budget'
        WHEN price < 100 THEN 'mid-range'
        WHEN price < 500 THEN 'premium'
        ELSE 'luxury'
    END AS price_tier,
    COUNT(*) FILTER (WHERE event_type = 'view')     AS views,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases,
    COUNT(*) FILTER (WHERE event_type = 'purchase')::FLOAT
        / NULLIF(COUNT(*) FILTER (WHERE event_type = 'view'), 0) AS conversion_rate
FROM user_events
WHERE price IS NOT NULL
GROUP BY user_id, CASE
    WHEN price < 25  THEN 'budget'
    WHEN price < 100 THEN 'mid-range'
    WHEN price < 500 THEN 'premium'
    ELSE 'luxury'
END;

The conversion_rate column is the key metric here: purchases divided by views, per price tier. A user with a 0.20 conversion rate in the mid-range tier but 0.02 in the premium tier is telling you something clear about where they are comfortable spending. The NULLIF prevents division-by-zero for price tiers a user has never browsed.

Recent Activity (Session-Level Signal)

CREATE MATERIALIZED VIEW user_recent_activity AS
SELECT
    user_id,
    category,
    COUNT(*) AS recent_interactions,
    window_start,
    window_end
FROM TUMBLE(user_events, event_time, INTERVAL '1' HOUR)
GROUP BY user_id, category, window_start, window_end;

This view uses a TUMBLE window to break the event stream into hourly buckets per user and category. Each row represents how many interactions a user had in a given category during a specific hour. At query time, you filter to the last few hours to get the user's session-level signal.

The TUMBLE window approach is important for keeping the view computationally efficient. Instead of maintaining a running count across all historical events, RisingWave maintains exactly one row per (user, category, hour) combination. Older windows naturally expire from the query filter without needing explicit cleanup.

Using the Preference Profile at Serving Time

The materialized views are live, continuously updated, and queryable over a standard PostgreSQL connection. Here is how to pull a complete user profile at page load time:

import psycopg2

conn = psycopg2.connect(
    host="localhost", port=4566,
    database="dev", user="root", password=""
)

def get_user_profile(user_id: str) -> dict:
    with conn.cursor() as cur:
        # Top categories by lifetime affinity
        cur.execute("""
            SELECT category, affinity_score
            FROM user_category_profile
            WHERE user_id = %s
            ORDER BY affinity_score DESC
            LIMIT 5
        """, (user_id,))
        top_categories = cur.fetchall()

        # Preferred price tier by conversion rate
        cur.execute("""
            SELECT price_tier, conversion_rate
            FROM user_price_sensitivity
            WHERE user_id = %s
            ORDER BY conversion_rate DESC
            LIMIT 1
        """, (user_id,))
        preferred_tier = cur.fetchone()

        # Recent session activity (last 3 hours)
        cur.execute("""
            SELECT category, SUM(recent_interactions) AS recent_score
            FROM user_recent_activity
            WHERE user_id = %s
              AND window_end >= NOW() - INTERVAL '3 hours'
            GROUP BY category
            ORDER BY recent_score DESC
            LIMIT 3
        """, (user_id,))
        recent_categories = cur.fetchall()

    return {
        "top_categories": top_categories,
        "preferred_price_tier": preferred_tier,
        "recent_session_categories": recent_categories,
    }

This function returns everything you need to personalize: what categories the user cares about long-term, what price range they actually convert in, and what they are actively browsing right now. Each field answers a different personalization question.

Personalized Homepage Ranking

With preference profiles available, the serving layer can score and rank catalog items for each user at page load time. This query joins product metadata against all three profile dimensions:

-- Score items for a specific user based on their preference profile
SELECT
    i.item_id,
    i.title,
    i.category,
    i.brand,
    i.price,
    COALESCE(cat.affinity_score, 0) * 0.4 +
    COALESCE(brand.brand_affinity, 0) * 0.3 +
    COALESCE(ps.conversion_rate * 100, 0) * 0.3 AS personalization_score
FROM items i
LEFT JOIN user_category_profile cat 
    ON cat.user_id = $1 AND cat.category = i.category
LEFT JOIN user_brand_profile brand 
    ON brand.user_id = $1 AND brand.brand = i.brand
LEFT JOIN user_price_sensitivity ps 
    ON ps.user_id = $1 
    AND ps.price_tier = CASE
        WHEN i.price < 25  THEN 'budget'
        WHEN i.price < 100 THEN 'mid-range'
        WHEN i.price < 500 THEN 'premium'
        ELSE 'luxury'
    END
WHERE i.in_stock = true
ORDER BY personalization_score DESC
LIMIT 50;

The weights (40% category, 30% brand, 30% price tier) reflect a judgment that category fit matters most, with brand loyalty and price comfort contributing roughly equally. These are starting values; A/B testing will reveal the right weights for your catalog and user base.

COALESCE(..., 0) handles the cold start problem gracefully. A new user with no brand interaction history gets a zero for the brand component rather than a null that breaks the query. Items still get ranked by whatever signals are available.

The Session-Aware Twist

The preference profile built so far captures lifetime behavior. But there is a critical second layer: what is the user interested in right now?

If someone has spent the last 30 minutes browsing sustainable fashion, that session signal should carry more weight than their historical preference for electronics. Session intent is a strong, transient signal that should outweigh durable but less-current preferences at ranking time.

Here is how to combine both signals in the personalization score:

-- Combined scoring: session signal (60%) + lifetime profile (40%)
SELECT
    i.item_id,
    i.title,
    i.category,
    i.brand,
    i.price,
    -- Long-term profile component (40% weight)
    (
        COALESCE(cat.affinity_score, 0) * 0.4 +
        COALESCE(brand.brand_affinity, 0) * 0.3 +
        COALESCE(ps.conversion_rate * 100, 0) * 0.3
    ) * 0.4 +
    -- Session signal component (60% weight)
    COALESCE(recent.recent_score, 0) * 0.6 AS combined_personalization_score
FROM items i
LEFT JOIN user_category_profile cat 
    ON cat.user_id = $1 AND cat.category = i.category
LEFT JOIN user_brand_profile brand 
    ON brand.user_id = $1 AND brand.brand = i.brand
LEFT JOIN user_price_sensitivity ps 
    ON ps.user_id = $1 
    AND ps.price_tier = CASE
        WHEN i.price < 25  THEN 'budget'
        WHEN i.price < 100 THEN 'mid-range'
        WHEN i.price < 500 THEN 'premium'
        ELSE 'luxury'
    END
LEFT JOIN (
    SELECT category, SUM(recent_interactions) AS recent_score
    FROM user_recent_activity
    WHERE user_id = $1
      AND window_end >= NOW() - INTERVAL '3 hours'
    GROUP BY category
) recent ON recent.category = i.category
WHERE i.in_stock = true
ORDER BY combined_personalization_score DESC
LIMIT 50;

The 60/40 session-to-lifetime split is intentional. Most personalization systems get this backwards: they compute a detailed long-term profile and treat session signals as a minor adjustment. But session signals are more temporally precise. A user browsing sustainable fashion right now is much more likely to convert on a linen shirt than on the electronics items they bought three months ago. Weighting session signals at 60% keeps recommendations responsive to current intent while still benefiting from historical context.

If the session is very short (under 5 minutes) or the user has no recent activity, recent.recent_score will be null or zero, and the lifetime profile takes over. The fallback is built into the COALESCE.

Why This Is Better Than Batch

To understand the concrete improvement, consider what happens with a batch pipeline when a user adds a product to cart.

In a batch system: the cart event writes to your application database. Tonight's batch job picks it up. By tomorrow morning, the user's preference profile reflects the cart addition. The next time they visit, recommendations have caught up, 12 to 24 hours later.

In a streaming system: the cart event publishes to Kafka. RisingWave ingests it within milliseconds. The user_category_profile and user_brand_profile materialized views update immediately. The very next page the user loads (even if it is the next click in this session) reflects the cart action in their preference scores.

This is not a marginal improvement. It is the difference between a personalization system that responds to the current session versus one that only knows who the user was yesterday. For fast-moving intent (fashion, trending products, news) the batch lag means your personalization is perpetually out of phase with actual user behavior.

Comparison: Batch vs. Streaming Profile Computation

Dimension	Batch Profile Computation	Streaming Materialized Views
Profile freshness	Hours to days old	Milliseconds after each event
Session awareness	None	Full: TUMBLE window per hour
Infrastructure	Spark/dbt + scheduler	RisingWave materialized views
Cart-event latency	Next batch run	Sub-second
New user cold start	Wait for first batch run	First event creates profile row
Operational overhead	Pipeline scheduling, failure handling, backfill	None: views self-maintain
Serving latency	Fast (pre-computed snapshots)	Fast (materialized view reads)
Scale-out model	Distributed batch jobs	RisingWave compute node scaling

The batch approach has one genuine advantage: pre-computed snapshots can be cached aggressively. But RisingWave's materialized views are themselves pre-computed: every view update is an incremental write, not a query-time computation. Reads are fast regardless of how many events have been processed.

Architecture Summary

The full real-time personalization pipeline has three layers:

Stream ingestion. User interaction events flow from your application to Kafka, and RisingWave consumes them through the user_events table. No additional ingestion infrastructure is needed.

Preference computation. Four materialized views maintain the preference profile continuously: user_category_profile, user_brand_profile, user_price_sensitivity, and user_recent_activity. RisingWave updates each view incrementally with every new event: only the rows affected by the event are recomputed, not the full view.

Serving. Your application queries preference data over a standard psycopg2 connection to RisingWave at port 4566. The personalization ranking query joins product catalog data against the preference views and returns a ranked list. Because the views are maintained in memory with state persisted to S3-compatible storage, query latency is consistent even under load.

This replaces what typically requires a Kafka consumer, a Spark streaming job, an Airflow schedule, and a separate feature store with a single streaming database and four SQL queries.

Getting Started

If you want to try this locally, RisingWave runs in Docker with a single command. Connect with any PostgreSQL client on port 4566 and create the views as shown. For production use, RisingWave Cloud provides a managed version with no cluster management required.

The preference scoring pipeline in this article is a foundation. Once you have the four base views, you can extend them with:

Time-weighted decay to reduce the influence of interactions older than 30 days without dropping them entirely
Cross-category affinity inference: users who frequently buy Accessories when they buy Fashion likely want coordinated suggestions
Price tier drift detection: a user whose browsing is shifting from mid-range to premium is signaling a spending pattern change worth acting on

All of these extensions are additional materialized views on top of the same event table, following the same pattern.

Ready to build your own real-time preference scoring pipeline? Sign up for RisingWave Cloud, free with no credit card required. Or run locally with the Quickstart guide.

Join the RisingWave Slack community to ask questions and share what you build.