Real-Time Content Recommendation Engine with Streaming SQL

Real-Time Content Recommendation Engine with Streaming SQL

RisingWave powers real-time content recommendation by maintaining continuously updated user-interest profiles and content-affinity scores as engagement events arrive. Instead of refreshing recommendations nightly, publishers and marketers can serve content suggestions that reflect a user's last 30 minutes of behavior — increasing click-through rates and time on site.

Why Staleness Kills Recommendation Quality

A recommendation engine is only as good as the behavioral signal it uses. If a user spent the last 20 minutes reading about electric vehicles, the next recommendation should reflect that interest — not last week's browsing history.

Most recommendation pipelines fail this test. Feature engineering runs in batch. User interest vectors update overnight. By the time the recommendation model uses fresh data, the user's session interest has shifted completely. You end up recommending content the user already read, or content from a topic they visited weeks ago.

Streaming SQL solves the freshness problem at the data layer. RisingWave maintains user interest profiles as continuously updated materialized views. The recommendation model reads a feature store that is always current — reflecting the last few seconds of user behavior.

Ingesting Engagement Events

Content engagement signals flow from your publishing platform into Kafka:

CREATE SOURCE content_events (
    event_id       VARCHAR,
    user_id        VARCHAR,
    content_id     VARCHAR,
    event_type     VARCHAR,   -- 'view', 'scroll', 'share', 'like', 'comment', 'click'
    dwell_seconds  INT,
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'content.engagement',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

CREATE TABLE content_metadata (
    content_id     VARCHAR PRIMARY KEY,
    title          VARCHAR,
    category       VARCHAR,
    subcategory    VARCHAR,
    tags           VARCHAR,
    published_at   TIMESTAMPTZ,
    author_id      VARCHAR
);

The content_metadata table is populated from your CMS via a CDC connector or manual upserts. RisingWave propagates metadata changes to all downstream materialized views automatically.

Real-Time User Interest Profiles

The core of any recommendation engine is the user interest profile — a vector of topic affinities derived from recent behavior. Build this as a sliding window materialized view:

CREATE MATERIALIZED VIEW user_interest_profile AS
SELECT
    e.user_id,
    m.category,
    window_start,
    window_end,
    -- Engagement scoring: weight by engagement depth
    SUM(
        CASE e.event_type
            WHEN 'view'    THEN 1
            WHEN 'scroll'  THEN 2
            WHEN 'like'    THEN 5
            WHEN 'share'   THEN 8
            WHEN 'comment' THEN 10
            ELSE 1
        END
        + LEAST(e.dwell_seconds / 30, 5)  -- bonus for dwell time, capped at 5
    ) AS interest_score,
    COUNT(*) AS interaction_count,
    MAX(e.event_time) AS last_interaction
FROM HOP(
    content_events,
    event_time,
    INTERVAL '5 minutes',
    INTERVAL '2 hours'
) e
JOIN content_metadata m ON e.content_id = m.content_id
GROUP BY e.user_id, m.category, window_start, window_end;

This sliding 2-hour window updates every 5 minutes. A user who starts reading tech content at 2 PM has a rising interest_score in the 'technology' category by 2:05 PM — ready for the recommendation API to use.

Recommendations need two signals: individual user interest AND what is currently trending. A popular article in a user's interest category should rank higher than an obscure one. Build a trending view:

CREATE MATERIALIZED VIEW trending_content AS
SELECT
    content_id,
    window_start,
    window_end,
    COUNT(*) FILTER (WHERE event_type IN ('view', 'click')) AS views,
    COUNT(*) FILTER (WHERE event_type = 'share')            AS shares,
    COUNT(*) FILTER (WHERE event_type = 'like')             AS likes,
    AVG(dwell_seconds)                                      AS avg_dwell,
    -- Trending score: recency-weighted engagement
    COUNT(*) * 1.0
        + COUNT(*) FILTER (WHERE event_type = 'share') * 3.0
        + COUNT(*) FILTER (WHERE event_type = 'like') * 2.0 AS trending_score
FROM TUMBLE(
    content_events,
    event_time,
    INTERVAL '30 minutes'
)
GROUP BY content_id, window_start, window_end;

Your recommendation API joins user_interest_profile with trending_content filtered by category to surface articles that are both personally relevant and currently popular.

Comparison: Recommendation Freshness Architectures

ApproachInterest Profile FreshnessTrending SignalInfrastructure ComplexityCTR Impact
Nightly batch ML pipeline12–24 hours staleDailyHigh (Spark + model serving)Baseline
Hourly feature refresh30–60 minutes staleHourlyMedium+5–10%
Lambda (batch + stream)MinutesMinutesVery high+10–15%
RisingWave streaming SQLSecondsSecondsLow+15–25%

Session-Level Interest Detection

For recommendation systems that need to understand in-session behavior, SESSION windows capture real-time reading patterns within a single visit:

CREATE MATERIALIZED VIEW session_interest AS
SELECT
    e.user_id,
    window_start,
    window_end,
    COUNT(DISTINCT e.content_id)           AS articles_read,
    COUNT(DISTINCT m.category)             AS categories_explored,
    -- Dominant category in this session
    MAX(m.category)                        AS primary_session_topic,
    SUM(e.dwell_seconds)                   AS total_session_dwell
FROM SESSION(
    content_events,
    event_time,
    INTERVAL '15 minutes'    -- 15 min inactivity = new session
) e
JOIN content_metadata m ON e.content_id = m.content_id
GROUP BY e.user_id, window_start, window_end;

The primary_session_topic field feeds directly into the recommendation API for in-session personalization — a user who is clearly in a 'finance' reading session gets finance-adjacent recommendations, even if their long-term profile skews toward technology.

Sinking Profiles to Your Feature Store

Push user interest profiles to your ML feature store or recommendation microservice via Kafka:

CREATE SINK user_profiles_to_feature_store
FROM user_interest_profile
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'ml.features.user-interest'
)
FORMAT UPSERT ENCODE JSON (
    force_append_only = false
);

Your recommendation model reads from the feature store as usual — but the features now reflect behavior from the last few minutes rather than the previous night's batch run.

FAQ

Q: Does RisingWave replace the recommendation model itself? A: No. RisingWave handles the feature engineering layer — computing user interest profiles, content affinity scores, and trending signals continuously. The recommendation model (collaborative filtering, matrix factorization, neural networks) runs in your ML serving infrastructure and reads from features maintained by RisingWave.

Q: How do I handle cold-start users with no history? A: Create a fallback materialized view that tracks global popularity by category. New users receive category-level trending recommendations until they accumulate enough interaction history for personalized recommendations — typically 5–10 events.

Q: Can I use RisingWave for email newsletter personalization? A: Yes. At newsletter send time, query user_interest_profile for each subscriber to get current category affinities. Unlike batch-built interest profiles, the RisingWave profile reflects recent behavior up to the moment of send — improving open rates for newsletters sent in response to user triggers.

Q: How does RisingWave handle a content catalog with millions of items? A: The materialized view computes aggregate interest by category, not per content item — keeping the state manageable regardless of catalog size. Per-item recommendations are computed at query time by joining the category interest score with a pre-filtered candidate set from your content database.

Q: What is the recommended architecture for serving recommendations at high throughput? A: Cache the user_interest_profile output in Redis for the hottest users (top 10% by activity volume), and query RisingWave directly for the rest. RisingWave handles thousands of queries per second on materialized views, so most recommendation APIs do not need an additional cache layer.

Get Started

Build a real-time recommendation feature store with the RisingWave quickstart guide.

Discuss recommendation architectures in the RisingWave Slack community.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.