Real-Time Recommendation System in 2026: Architecture, Algorithms, and SQL

Real-Time Recommendation System in 2026: Architecture, Algorithms, and SQL

·

15 min read

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is a real-time recommendation system in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A real-time recommendation system in 2026 combines three signal layers: SQL-computed heuristics for trending and affinity signals, item-based collaborative filtering for co-engagement patterns, and vector similarity for semantic relationships between items. These layers are kept continuously fresh by a streaming database like RisingWave, which maintains incremental materialized views that update within seconds of each user action. AI agents can query the same views via MCP to power conversational recommendations."
      }
    },
    {
      "@type": "Question",
      "name": "Why do batch recommendation pipelines fail in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Batch pipelines compute signals on a nightly schedule. Any user action after the last run is invisible until the next cycle. A user who browses for 30 minutes and purchases receives recommendations for what they just bought, not what they want next. In 2026, when AI agents are involved in the recommendation loop, stale data does not just produce irrelevant suggestions -- it causes agents to take actions based on outdated user intent, which compounds the problem."
      }
    },
    {
      "@type": "Question",
      "name": "How do you build a recommendation system with SQL in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "You define user-item interaction scores, an item co-view matrix, trending item windows, and category affinity profiles as materialized views in a streaming database like RisingWave. These views are computed incrementally as events arrive from Kafka or another event stream. Serving is a standard SQL query against the materialized views -- no batch job, no separate serving infrastructure, and no stale cache to invalidate."
      }
    },
    {
      "@type": "Question",
      "name": "What role do AI agents play in recommendation systems in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "AI agents in 2026 act as the orchestration layer above the recommendation engine. They query live recommendation scores via MCP, combine them with conversational context and user-stated preferences, and synthesize a final suggestion. Because the underlying materialized views update continuously, the agent always works from current signal -- not a cached result from the last batch run."
      }
    },
    {
      "@type": "Question",
      "name": "When do you still need ML models for recommendations?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "SQL heuristics and collaborative filtering handle the majority of recommendations well: they are fast, explainable, and always fresh. ML models -- specifically two-tower neural networks and sequence models -- add value for cold start (new users or new items with no interaction history), deep taste modeling (capturing nuanced long-term preferences), and cross-domain signals (inferring interest in category B from deep engagement in category A). A layered architecture uses SQL for the 80% case and ML for the 20% where precision matters most."
      }
    }
  ]
}

What Changed for Recommendations in 2026

The recommendation problem did not get easier in 2026. If anything, user expectations moved faster than the infrastructure that served them.

Three shifts defined the year.

The first was the rise of AI agents as personalization intermediaries. In previous years, a recommendation system meant an algorithm producing a ranked list that a UI displayed in a carousel. In 2026, AI agents handle a growing share of personalization decisions. A shopping assistant agent does not display a carousel -- it asks about your needs, queries live recommendation signals, and synthesizes a suggestion that accounts for what you said, what you did, and what others with similar behavior bought next. The interface changed; the need for fresh signal did not.

The second shift was the commoditization of vector similarity. Embedding-based semantic search, once requiring a dedicated vector database and a complex embedding pipeline, became a native capability of streaming databases. RisingWave, for example, ships with a vector(n) type, a cosine similarity operator (<=>), HNSW indexing, and an openai_embedding() function that computes embeddings inline. You no longer need to maintain a separate Pinecone or Weaviate instance alongside your operational database. Semantic retrieval is a SQL query.

The third shift was Model Context Protocol (MCP) becoming the standard interface between AI agents and data systems. Instead of writing custom API adapters for each data source an agent needs to query, MCP provides a common protocol. RisingWave ships an official MCP server that exposes live materialized views to any MCP-compatible agent. The implication for recommendations is significant: an agent handling a customer conversation can query recommendation_scores the same way a developer would query a table in psql -- and it gets data that is seconds old, not hours old.

Together, these shifts produced a new architecture for recommendations in 2026. The core insight is unchanged -- fresh signals produce better recommendations -- but the mechanisms for computing those signals, storing them, and consuming them all moved.

The Business Stakes

Amazon attributes roughly 35% of its total revenue to recommendations. Netflix has estimated its recommendation system saves approximately $1 billion per year in subscriber retention. These numbers are not from 2026; they are older benchmarks that have only become more relevant as personalization systems matured.

What they reflect is a durable truth: purchase intent peaks at the moment of engagement. A user who just spent 30 minutes browsing trail running gear has high purchase intent for trail running gear right now. A recommendation system that saw that session in last night's batch run acts on stale intent. A system that updated its signals within seconds of each event acts on current intent.

The gap between those two systems is the gap between a recommendation that converts and one that the user ignores.

The Three Recommendation Approaches in 2026

Every production recommendation system in 2026 uses at least one of three approaches. Most use all three, layered.

The simplest approach is SQL-only and requires no ML infrastructure. You compute what is trending in each category over a rolling window, and you compute each user's category affinity from their interaction history. Combine them: recommend trending items from the categories the user engages with most.

This approach has a straightforward property -- it works without any interaction history for an item, and it requires no model training. The recommendations are always explainable. The freshness depends entirely on how often you recompute the signals, which in a streaming database is continuous.

The limitation is that it does not capture item-level preferences. A user with high affinity for "running" will see the same trending running items as every other user with high affinity for "running." It is personalized at the category level, not the item level.

Collaborative Filtering: Item-Based Co-Engagement

Item-based collaborative filtering finds items that are frequently co-viewed or co-purchased within the same session, then recommends items similar to what the current user engaged with. It is computable entirely in SQL.

The co-view matrix answers: "users who viewed item A also viewed item B." The recommendation is: "the user just engaged with item A, so show item B." It is personalized at the item level and captures behavioral patterns across the entire user base.

The limitation is that it requires sufficient interaction history to produce a meaningful co-view matrix. For new items (or new users), collaborative filtering has a cold start problem. It also does not capture semantic relationships -- two items might be conceptually similar but never co-viewed because they are in different parts of the catalog.

Semantic: Vector Similarity

Vector similarity finds items with similar semantic properties by comparing embedding vectors. Two items might never be co-viewed, but if their descriptions and attributes produce similar embeddings, a semantic system will surface them as related.

In 2026, this is a SQL query against a column of type vector(n) using the <=> cosine similarity operator. The embedding computation can happen inline using openai_embedding() or using precomputed embeddings stored in the item table.

The limitation is compute cost -- computing embeddings for every item is non-trivial, and the HNSW index needs to be built and maintained. For most recommendation use cases, heuristic and collaborative filtering cover the majority of traffic. Semantic similarity adds value at the edges: new items, niche interests, and cross-category discovery.

The Complete SQL Pipeline

The following SQL runs on RisingWave. It ingests user events from Kafka and maintains all recommendation signals as incremental materialized views.

Define the Event Stream

-- User interaction stream from Kafka
CREATE TABLE user_events (
    event_id    TEXT,
    user_id     TEXT,
    item_id     TEXT,
    event_type  TEXT,  -- 'view', 'add_to_cart', 'purchase', 'like'
    category    TEXT,
    session_id  TEXT,
    event_time  TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'user-events',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

User-Item Interaction Scores

This view maintains a weighted interaction score for every user-item pair. It updates incrementally as events arrive -- no batch job.

CREATE MATERIALIZED VIEW user_item_scores AS
SELECT
    user_id,
    item_id,
    SUM(CASE
        WHEN event_type = 'purchase'    THEN 10
        WHEN event_type = 'add_to_cart' THEN 5
        WHEN event_type = 'like'        THEN 3
        WHEN event_type = 'view'        THEN 1
        ELSE 0
    END) AS score,
    MAX(event_time) AS last_interaction
FROM user_events
GROUP BY user_id, item_id;

A score of 10 or higher effectively means the user purchased the item. The serving query will filter these out to avoid recommending items the user already bought.

Item Co-View Matrix

This is the core of item-based collaborative filtering. It counts how often two items appear in the same session as view events.

CREATE MATERIALIZED VIEW item_coview AS
SELECT
    a.item_id AS item_a,
    b.item_id AS item_b,
    COUNT(*) AS coview_count
FROM user_events a
JOIN user_events b
    ON a.session_id = b.session_id
    AND a.item_id < b.item_id
    AND a.event_type = 'view'
    AND b.event_type = 'view'
GROUP BY a.item_id, b.item_id;

The a.item_id < b.item_id condition ensures each pair is counted once, not twice. This is a self-join on the event stream -- RisingWave maintains it incrementally as new sessions arrive.

TUMBLE windows compute trending signals over a fixed time boundary. A new window opens every hour. Items with the most interactions in the current window are the trending items.

CREATE MATERIALIZED VIEW trending_items AS
SELECT
    item_id,
    category,
    COUNT(*) AS interaction_count,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    window_start,
    window_end
FROM TUMBLE(user_events, event_time, INTERVAL '1' HOUR)
GROUP BY item_id, category, window_start, window_end;

You can query this view for the most recent completed window to get the current trending list. Because TUMBLE uses event time, the result is deterministic even if events arrive out of order.

Category Affinity per User

This view captures how much each user has engaged with each category. It drives the heuristic layer: recommend trending items from the user's highest-affinity categories.

CREATE MATERIALIZED VIEW user_category_affinity AS
SELECT
    user_id,
    category,
    SUM(CASE
        WHEN event_type = 'purchase'    THEN 10
        WHEN event_type = 'add_to_cart' THEN 5
        WHEN event_type = 'view'        THEN 1
        ELSE 0
    END) AS affinity_score
FROM user_events
GROUP BY user_id, category;

Combined Recommendation Scores

This view joins user-item scores with the co-view matrix to produce a recommendation score for each user-item pair. A user with a high score for item A propagates signal to all items that frequently co-appear with item A.

CREATE MATERIALIZED VIEW recommendation_scores AS
SELECT
    uis.user_id,
    cv.item_b AS recommended_item,
    SUM(uis.score * cv.coview_count) AS recommendation_score
FROM user_item_scores uis
JOIN item_coview cv ON uis.item_id = cv.item_a
GROUP BY uis.user_id, cv.item_b;

This is the materialized view that your serving layer queries -- and that AI agents read via MCP.

The Serving Query

The serving query applies three filters before returning results: it restricts to items in categories the user has affinity for, it excludes items the user already purchased, and it orders by the recommendation score.

SELECT
    r.recommended_item,
    r.recommendation_score,
    i.category,
    i.item_name
FROM recommendation_scores r
JOIN user_category_affinity a
    ON r.user_id = a.user_id
JOIN items i
    ON r.recommended_item = i.item_id
    AND i.category = a.category
WHERE r.user_id = $1
  AND r.recommended_item NOT IN (
      SELECT item_id
      FROM user_item_scores
      WHERE user_id = $1
        AND score >= 10  -- exclude purchased items
  )
ORDER BY r.recommendation_score DESC
LIMIT 10;

Because all the upstream views are incremental materialized views, this query returns results in milliseconds. There is no computation happening at query time -- RisingWave has already maintained the pre-computed scores.

AI Agents and Recommendations in 2026

The serving query above is what a backend API would run on each page load. In 2026, there is a second consumer: AI agents.

An agent handling a shopping conversation does not use a carousel. It uses a conversation. The agent needs to query recommendation_scores with the current user's ID, combine the results with what the user said in the conversation, and produce a natural language suggestion.

With RisingWave's MCP server, the agent can do this directly. The MCP tool call is equivalent to the serving query above -- the agent gets back the top-10 recommendation candidates and uses its reasoning to select and present the best one given the conversation context.

The critical point: because recommendation_scores is a materialized view maintained by streaming SQL, the agent always reads signal that reflects the user's current session. If the user browsed for 30 minutes before opening the chat, all 30 minutes of interaction are already in the score. The agent does not need to "catch up" on what the user did -- the database already did that.

This is what "next best action" means in 2026. Not a static list of recommended products, but a live query that the agent uses to ground its suggestions in current user behavior.

Batch vs Streaming Freshness: What You Actually Lose

Consider a user who arrives at 2pm and browses trail running gear for 30 minutes. They view six items, add one to their cart, remove it, and view two more before leaving without purchasing.

In a batch recommendation system (nightly recompute at 3am):

  • At 2:30pm when they leave, the system still thinks they are a "generic sports shopper" based on last night's data.
  • If they return at 6pm, they will see the same generic sports recommendations as before.
  • Their 30 minutes of high-intent trail running behavior is invisible until 3am the next day.
  • Even after the batch run, the co-view signals from their session compete with millions of other sessions -- the signal is diluted.

In a streaming recommendation system (RisingWave, continuous):

  • By 2:01pm, the first view events have updated user_item_scores and item_coview.
  • By 2:05pm, user_category_affinity already shows elevated affinity for trail running.
  • By 2:30pm, recommendation_scores reflects the full 30-minute session.
  • If they return at 6pm, they see trail running recommendations informed by exactly what they looked at.

The 30-minute session window is exactly the "purchase window" -- the period of highest purchase intent. Batch systems are structurally blind to it. Streaming systems are built for it.

What ML Models Still Add

To be direct: SQL heuristics and collaborative filtering do not replace deep learning recommendation models. They handle the majority of cases well. ML models add precision where SQL falls short.

The cases where ML wins:

Cold start for new users. A new user has no interaction history. user_item_scores is empty. recommendation_scores produces nothing. An ML model trained on demographic and contextual signals can generate initial recommendations. SQL cannot.

Cold start for new items. A new item has no co-view history. item_coview has no rows for it. An ML model can use item attribute embeddings to position a new item in the recommendation space before it accumulates interactions.

Deep taste modeling. A user with 10,000 interactions has a complex preference profile. SQL can capture what they have engaged with; ML models can capture patterns in the sequence of engagement that predict what they want next -- even if they have never explicitly engaged with that category.

Cross-domain signals. A user who reads extensively about outdoor camping might be interested in trail running gear, even if they have never browsed that category. Collaborative filtering cannot infer this without the explicit behavioral bridge. Embedding models can.

The layered architecture for 2026: use SQL heuristics and collaborative filtering (fast, always fresh, fully explainable) for the 80% of recommendations that happen at high traffic. Use ML model outputs (precise, computationally expensive, batch-updated) for the 20% of cases where the user is new, the item is new, or the SQL signals are thin. Use vector similarity as the semantic bridge between the two layers.

The Complete 2026 Recommendation Architecture

Putting it together:

Layer 1: SQL heuristics (RisingWave)

  • Trending items per category (TUMBLE window, hourly)
  • Category affinity per user (continuous materialized view)
  • Output: trending items ranked by user affinity

Layer 2: Collaborative filtering (RisingWave)

  • Item co-view matrix (session-scoped self-join, continuous)
  • User-item interaction scores (weighted event counts, continuous)
  • Combined recommendation scores (join of the above, continuous)
  • Output: personalized item scores updated within seconds of each event

Layer 3: Semantic similarity (RisingWave vector engine)

  • Item embeddings stored as vector(n) columns in the items table
  • HNSW index for approximate nearest neighbor queries
  • Cross-category discovery via cosine similarity (<=>)
  • Output: semantically similar items for cold start and edge cases

Layer 4: ML model (offline, batch)

  • Two-tower neural network or sequence model trained on interaction history
  • Scores refreshed hourly or daily
  • Written to RisingWave as a static table that the serving layer can join
  • Output: deep preference scores for users with rich history

Serving layer:

  • A single SQL query joins signals from all four layers
  • PostgreSQL wire protocol on port 4566 -- any PostgreSQL client works
  • AI agents query via MCP, backend APIs via standard connection pool

AI agent layer (MCP):

  • RisingWave MCP server exposes materialized views as queryable tools
  • Agents query recommendation_scores with user ID and conversation context
  • Agent synthesizes natural language recommendation from structured results

The RisingWave components run on open source Apache 2.0 code. You connect to it using the PostgreSQL wire protocol on port 4566 -- the same client libraries you already use for PostgreSQL work with no modification.

Running in Production

A few operational notes for teams deploying this architecture:

Event volume scaling. The co-view matrix is a self-join, and self-joins can be expensive at high event volume. If you have millions of events per day, consider partitioning by event time and limiting the session window used for the co-view join (e.g., only match events within the same two-hour session block rather than the entire session history).

Score decay. The interaction score view above is a lifetime accumulation. A purchase made three years ago will still contribute score. For most applications, you want recency weighting. You can add this by computing scores on a windowed view (e.g., events from the last 90 days) rather than the full event history.

Bootstrapping. The co-view matrix starts empty. For an initial deployment, backfill historical session data into the event stream so the matrix has signal before the first user query.

Sink to Iceberg. If you need a historical audit trail of recommendation signals, RisingWave can sink any materialized view to an Iceberg table. The live view stays in RisingWave for serving; the historical record accumulates in the lakehouse.

The recommendation problem has not changed. What changed is that the infrastructure to solve it correctly -- streaming SQL, built-in vector support, and agent-ready query interfaces -- is now available in a single open source system without the operational complexity of assembling five separate tools.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.