Retargeting Data Pipelines with Streaming SQL

Retargeting Data Pipelines with Streaming SQL

RisingWave enables low-latency retargeting pipelines by continuously updating audience segment membership as behavioral events arrive. Users who visited a product page, abandoned a cart, or reached a checkout threshold are added to retargeting segments within seconds — not hours — allowing ad campaigns to reach high-intent users while purchase intent is still warm.

The Intent Decay Problem

Retargeting works because of intent. A user who just viewed a product page has demonstrated interest — they are far more likely to convert than an anonymous visitor. But intent decays fast. Research consistently shows that retargeting ads served within an hour of a site visit are 3–5x more effective than ads served 24 hours later.

Most retargeting pipelines fail this test. Behavioral events are batched into hourly or daily audience exports. By the time the export runs, uploads to the ad platform, and the campaign serves impressions, the user's intent has cooled. You are spending money retargeting users who have already converted elsewhere — or simply moved on.

Streaming SQL closes this gap. RisingWave continuously evaluates retargeting eligibility as events arrive, maintaining audience segments that update in near real time.

Pipeline Architecture

A streaming retargeting pipeline has four stages:

  1. Event ingestion: Clickstream and behavioral events flow into Kafka from your tag management system
  2. Segment computation: RisingWave evaluates segment criteria continuously using materialized views
  3. Audience export: Qualifying users are sinked to your ad platform's audience API or a Kafka topic consumed by your DMP
  4. Suppression: Converted users are removed from retargeting audiences in real time to prevent wasted spend

Setting Up Event Sources

CREATE SOURCE site_events (
    event_id       VARCHAR,
    user_id        VARCHAR,
    session_id     VARCHAR,
    event_type     VARCHAR,   -- 'pageview', 'product_view', 'add_to_cart', 'checkout_start', 'purchase'
    product_id     VARCHAR,
    category       VARCHAR,
    page_url       VARCHAR,
    value_usd      DOUBLE PRECISION,
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'web.site.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

CREATE SOURCE purchase_events (
    order_id       VARCHAR,
    user_id        VARCHAR,
    total_usd      DOUBLE PRECISION,
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'web.purchases',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Building Retargeting Audience Segments

Create a materialized view that tracks user activity within a recency window. We use a HOP window to slide every 30 minutes over a 7-day lookback — giving us a continuously updated view of who was active recently:

CREATE MATERIALIZED VIEW retargeting_audiences AS
WITH session_activity AS (
    SELECT
        user_id,
        window_start,
        window_end,
        COUNT(*) FILTER (WHERE event_type = 'product_view')      AS product_views,
        COUNT(*) FILTER (WHERE event_type = 'add_to_cart')        AS cart_adds,
        COUNT(*) FILTER (WHERE event_type = 'checkout_start')     AS checkout_starts,
        COUNT(DISTINCT product_id)                                AS unique_products,
        MAX(value_usd)                                            AS max_product_value,
        MAX(event_time)                                           AS last_active
    FROM HOP(
        site_events,
        event_time,
        INTERVAL '30 minutes',
        INTERVAL '7 days'
    )
    GROUP BY user_id, window_start, window_end
),
converters AS (
    SELECT DISTINCT user_id
    FROM purchase_events
    WHERE event_time >= NOW() - INTERVAL '7 days'
)
SELECT
    sa.user_id,
    sa.window_start,
    sa.window_end,
    sa.last_active,
    sa.product_views,
    sa.cart_adds,
    sa.max_product_value,
    CASE
        WHEN sa.checkout_starts > 0          THEN 'cart_abandoner'
        WHEN sa.cart_adds > 0                THEN 'cart_viewer'
        WHEN sa.product_views >= 3           THEN 'high_intent_browser'
        WHEN sa.product_views >= 1           THEN 'product_viewer'
        ELSE 'site_visitor'
    END AS audience_segment,
    -- Exclude recent converters from retargeting
    CASE WHEN c.user_id IS NOT NULL THEN TRUE ELSE FALSE END AS recently_converted
FROM session_activity sa
LEFT JOIN converters c ON sa.user_id = c.user_id;

The recently_converted flag is critical. Without real-time conversion suppression, you will serve ads to users who just completed a purchase — wasting budget and creating a poor user experience.

Comparison: Retargeting Pipeline Approaches

ApproachAudience FreshnessConversion SuppressionSetup EffortAd Platform Compatibility
Pixel-only (platform native)MinutesDelayedMinimalSingle platform
Nightly CRM export24 hoursNext-dayLowMulti-platform
Hourly batch pipeline1–2 hoursHourlyMediumMulti-platform
RisingWave streaming SQLSecondsReal-timeLowMulti-platform

Session-Based Audience Detection

Some retargeting use cases require session-level analysis — for example, identifying users whose session exceeded a high-value threshold. SESSION windows in RisingWave group events by inactivity gaps:

CREATE MATERIALIZED VIEW high_value_sessions AS
SELECT
    user_id,
    window_start,
    window_end,
    COUNT(*)                       AS events_in_session,
    SUM(value_usd)                 AS session_value,
    MAX(event_type)                AS deepest_funnel_stage
FROM SESSION(
    site_events,
    event_time,
    INTERVAL '30 minutes'     -- session gap: 30 min inactivity = new session
)
GROUP BY user_id, window_start, window_end
HAVING SUM(value_usd) > 100;  -- only sessions with product value > $100

Users in high_value_sessions represent your highest-intent retargeting audience — they spent a meaningful amount of time engaging with high-value products within a single session.

Exporting Audiences to Ad Platforms

Push segment membership updates to Kafka for downstream ad platform connectors:

CREATE SINK retargeting_audience_export
FROM retargeting_audiences
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'retargeting.audiences.updates'
)
FORMAT UPSERT ENCODE JSON (
    force_append_only = false
);

A lightweight consumer reads from this topic and calls your ad platform's Customer Match or Custom Audience API to add or remove users in near real time. When recently_converted flips to TRUE, the consumer removes the user from active retargeting campaigns within seconds of purchase.

FAQ

Q: How does RisingWave handle users who appear in multiple retargeting segments? A: The CASE statement in the materialized view assigns a single priority segment. For multi-segment targeting, add separate boolean columns (is_cart_abandoner, is_high_intent, etc.) so users can qualify for multiple campaigns simultaneously.

Q: Can I use RisingWave for cross-device retargeting? A: RisingWave operates on user identifiers. If your identity graph maps multiple device IDs to a single user ID, pass the resolved user ID in events. If you use a probabilistic identity graph, join against a device_identity reference table using RisingWave's temporal join capability.

Q: What happens to audience segments when RisingWave restarts? A: RisingWave persists materialized view state to storage. On restart, the state is recovered and processing resumes from the last checkpointed Kafka offset. Audience segments are not lost.

Q: How do I comply with GDPR right-to-erasure in a streaming pipeline? A: Use the CDC connector to capture user deletion events from your consent management system. A deletion event triggers a delete in the source table, which propagates through materialized views and removes the user from all active audience segments.

Q: Can I A/B test different audience definitions? A: Yes. Create two materialized views with different segment criteria (e.g., cart abandoner with 1-day vs 7-day lookback) and route a percentage of traffic to each via your ad platform's experiment framework.

Get Started

Build a real-time retargeting pipeline in under 30 minutes with the RisingWave quickstart guide.

Share your retargeting architecture in the RisingWave Slack community.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.