Multi-Touch Marketing Attribution with Streaming SQL

Multi-Touch Marketing Attribution with Streaming SQL

RisingWave enables real-time multi-touch attribution by continuously joining touchpoint streams with conversion events and distributing credit across the customer journey as each conversion arrives. Instead of running attribution models in overnight batch jobs, marketing teams get continuously updated channel credit that reflects today's conversion data — enabling intra-day budget decisions based on accurate attribution.

The Limits of Last-Touch Attribution

Last-touch attribution is the default for most marketing teams — and it is also the most misleading model available. Crediting 100% of conversion value to the final touchpoint ignores every upstream channel that built awareness, sparked interest, and moved the customer toward conversion.

In a typical customer journey: paid social introduces the brand (ignored), organic search educates the buyer (ignored), email nurtures the decision (ignored), and direct converts (gets all credit). The result is systematic undervaluation of top-of-funnel channels, chronic underinvestment in brand-building, and budget concentration in last-click channels that deliver diminishing returns.

Multi-touch attribution distributes credit across the full journey. The challenge is computing it continuously — traditional multi-touch models require full journey reconstruction, which is expensive in batch and was nearly impossible in real time. Streaming SQL changes this.

Data Model: Touchpoints and Conversions

Build the foundational data model to capture the full customer journey:

CREATE SOURCE touchpoint_events (
    touchpoint_id  VARCHAR,
    user_id        VARCHAR,
    channel        VARCHAR,
    campaign_id    VARCHAR,
    ad_group_id    VARCHAR,
    creative_id    VARCHAR,
    touchpoint_type VARCHAR,  -- 'impression', 'click', 'email_open', 'email_click', 'visit'
    cost_usd       DOUBLE PRECISION,
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'attribution.touchpoints',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

CREATE SOURCE conversion_events (
    conversion_id   VARCHAR,
    user_id         VARCHAR,
    revenue_usd     DOUBLE PRECISION,
    conversion_type VARCHAR,
    converted_at    TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'attribution.conversions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Attribution window configuration per channel
CREATE TABLE attribution_config (
    channel              VARCHAR PRIMARY KEY,
    click_window_days    INT,
    view_window_days     INT,
    model_weight         DOUBLE PRECISION  -- for position-based model
);

INSERT INTO attribution_config VALUES
    ('paid_search',    30, 1,  0.4),
    ('paid_social',    28, 1,  0.2),
    ('email',          30, 0,  0.2),
    ('organic',        30, 0,  0.1),
    ('direct',          0, 0,  0.1);

Linear Attribution Model

Linear attribution distributes conversion credit equally across all touchpoints in the conversion window. This is the simplest multi-touch model and a good starting point for most teams:

CREATE MATERIALIZED VIEW linear_attribution AS
WITH conversion_windows AS (
    SELECT
        cv.conversion_id,
        cv.user_id,
        cv.revenue_usd,
        cv.converted_at,
        t.touchpoint_id,
        t.channel,
        t.campaign_id,
        t.cost_usd                         AS touchpoint_cost,
        t.event_time                       AS touchpoint_time,
        -- Count total touchpoints in the attribution window per conversion
        COUNT(t.touchpoint_id) OVER (
            PARTITION BY cv.conversion_id
        )                                  AS total_touchpoints_in_window
    FROM conversion_events cv
    JOIN touchpoint_events t
        ON cv.user_id = t.user_id
        AND t.event_time BETWEEN cv.converted_at - INTERVAL '30 days'
                             AND cv.converted_at
)
SELECT
    conversion_id,
    user_id,
    channel,
    campaign_id,
    converted_at,
    touchpoint_time,
    revenue_usd,
    -- Linear: equal credit to each touchpoint
    revenue_usd / total_touchpoints_in_window           AS attributed_revenue,
    1.0 / total_touchpoints_in_window                   AS credit_fraction,
    total_touchpoints_in_window
FROM conversion_windows;

This view updates incrementally with each new conversion. The moment a conversion arrives in Kafka, RisingWave evaluates all touchpoints within the 30-day window and distributes credit — no batch job required.

Time-Decay Attribution Model

Time-decay attribution gives more credit to touchpoints that occurred closer to conversion — reflecting the intuition that recent interactions had more influence on the decision:

CREATE MATERIALIZED VIEW time_decay_attribution AS
WITH scored_touchpoints AS (
    SELECT
        cv.conversion_id,
        cv.user_id,
        cv.revenue_usd,
        cv.converted_at,
        t.touchpoint_id,
        t.channel,
        t.campaign_id,
        t.event_time                        AS touchpoint_time,
        -- Time decay: half-life of 7 days
        POWER(0.5,
            EXTRACT(EPOCH FROM (cv.converted_at - t.event_time))
            / (7 * 86400.0)
        )                                   AS decay_weight
    FROM conversion_events cv
    JOIN touchpoint_events t
        ON cv.user_id = t.user_id
        AND t.event_time BETWEEN cv.converted_at - INTERVAL '30 days'
                             AND cv.converted_at
),
weighted AS (
    SELECT *,
        SUM(decay_weight) OVER (PARTITION BY conversion_id) AS total_weight
    FROM scored_touchpoints
)
SELECT
    conversion_id,
    user_id,
    channel,
    campaign_id,
    converted_at,
    touchpoint_time,
    revenue_usd,
    decay_weight / total_weight                           AS credit_fraction,
    revenue_usd * (decay_weight / total_weight)           AS attributed_revenue
FROM weighted;

The time-decay model naturally rewards the channels that close deals (search, direct) while still crediting the channels that warmed up the prospect (social, display). Both models update continuously as new conversions stream in.

Channel Attribution Summary

Build a rollup view that aggregates attribution by channel for budget decision-making:

CREATE MATERIALIZED VIEW channel_attribution_summary AS
SELECT
    channel,
    campaign_id,
    window_start,
    window_end,
    COUNT(DISTINCT conversion_id)        AS attributed_conversions,
    SUM(attributed_revenue)              AS attributed_revenue,
    SUM(touchpoint_cost)                 AS total_channel_spend,
    SUM(attributed_revenue) / NULLIF(SUM(touchpoint_cost), 0) AS attributed_roas
FROM (
    SELECT tda.*, s.spend_usd AS touchpoint_cost
    FROM time_decay_attribution tda
    LEFT JOIN touchpoint_events s ON tda.touchpoint_id = s.touchpoint_id
) enriched
JOIN (
    SELECT window_start, window_end
    FROM TUMBLE(conversion_events, converted_at, INTERVAL '1 day')
) windows ON converted_at BETWEEN window_start AND window_end
GROUP BY channel, campaign_id, window_start, window_end;

Comparison: Attribution Model Approaches

ModelCredit LogicTop-Funnel VisibilityImplementation ComplexityStreaming Feasible
Last-touch100% to final clickNoneVery lowYes
First-touch100% to first clickFullVery lowYes
LinearEqual credit all touchesFullLowYes (RisingWave)
Time-decayMore credit near conversionPartialMediumYes (RisingWave)
Data-driven (ML)Model-optimized weightsFullHighHybrid

Publishing Attribution Results

Sink attribution results to your data warehouse for historical analysis and to Kafka for real-time consumption by bid management systems:

CREATE SINK attribution_to_warehouse
FROM channel_attribution_summary
WITH (
    connector = 'iceberg',
    type = 'upsert',
    catalog.type = 'glue',
    s3.bucket = 'your-data-lake-bucket',
    database.name = 'marketing_attribution',
    table.name = 'channel_daily_attribution'
);

Budget optimization systems subscribing to the Kafka sink receive continuously updated attribution signals and can adjust bid strategies accordingly — without waiting for the overnight attribution run.

FAQ

Q: How does RisingWave handle the lookback window for attribution — won't the state become very large? A: RisingWave manages state incrementally. For a 30-day attribution window, the system retains touchpoint events for 30 days per user. State size scales with your active user count and event volume, not total history. Configure watermarks to expire state beyond the attribution window.

Q: Can I implement position-based (U-shaped) attribution in SQL? A: Yes. U-shaped attribution gives 40% to first touch, 40% to last touch, and distributes 20% equally among middle touches. Implement this using ROW_NUMBER() and COUNT() window functions to identify the first and last touchpoints per conversion, then apply the weight formula in a CASE expression.

Q: How do I handle view-through attribution alongside click-through? A: Add touchpoint_type to the attribution window join condition. Include both 'impression' (for view-through) and 'click' events, but apply different weights. Use the attribution_config table to define per-channel view window lengths and disable view-through for channels where it doesn't make sense.

Q: Does streaming attribution produce different results than batch attribution? A: For completed journeys (where the attribution window has closed), streaming and batch produce identical results. The difference is timing — streaming attribution results are available seconds after each conversion rather than the next morning.

Q: How do I reconcile streaming attribution with the numbers in Google Analytics or Meta Ads Manager? A: Platform-reported conversions use platform-proprietary attribution models and never match cross-channel attribution exactly. Establish your RisingWave attribution model as the internal source of truth for budget decisions, and treat platform numbers as directional signals rather than ground truth.

Get Started

Implement real-time multi-touch attribution with the RisingWave quickstart guide.

Discuss attribution model design with the community in RisingWave Slack.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.