Real-Time Ad Click Attribution with Streaming SQL

Real-Time Ad Click Attribution with Streaming SQL

Why Ad Attribution Is Broken (and How to Fix It)

Marketing teams spend millions on ad campaigns across Google Ads, Facebook, LinkedIn, TikTok, and email. But when a customer finally converts, who gets the credit? The Google Ad they clicked three days ago? The retargeting ad on Facebook? The email that sealed the deal?

This is the multi-touch attribution problem, and most teams solve it badly. They either run nightly batch jobs that produce stale results, or they default to last-click attribution because it is the easiest to implement. Both approaches leave money on the table. By the time yesterday's batch job reveals that a campaign is burning budget with no conversions, thousands of dollars have already been wasted.

Real-time ad click attribution changes the game. By joining click events with conversion events as they happen, within defined time windows, you can compute attribution scores that update continuously. In this article, you will build a complete multi-touch attribution pipeline using streaming SQL in RisingWave, a streaming database that lets you express complex event processing logic as standard SQL materialized views.

The Architecture of Real-Time Attribution

Traditional attribution pipelines batch-process data overnight. A typical setup involves extracting click logs from ad platforms, loading conversion events from a data warehouse, running a Spark or Airflow job to join them, and writing results to a BI tool. Latency is measured in hours.

A streaming attribution pipeline looks different:

graph LR
    A[Ad Platforms] -->|Click events| B[Kafka / Kinesis]
    C[App Backend] -->|Conversion events| B
    B --> D[RisingWave]
    D -->|Materialized Views| E[Attribution Scores]
    E --> F[Dashboard / BI Tool]
    E --> G[Campaign Optimizer]

Click events and conversion events flow into a message broker like Apache Kafka. RisingWave ingests both streams, joins them within a configurable attribution window, and computes attribution scores via materialized views. Downstream systems, whether dashboards, alerting tools, or campaign optimizers, consume the continuously updated results.

The key advantage: materialized views in RisingWave are incrementally maintained. When a new click or conversion arrives, only the affected rows are recomputed. You don't re-process the entire dataset.

Setting Up the Data Model

Let's build this step by step. All SQL in this article has been verified against RisingWave 2.8.0.

Defining the Source Tables

In a production deployment, you would create these as source tables connected to Kafka topics. For this tutorial, we use standard tables so you can follow along with sample data:

CREATE TABLE ad_clicks (
    click_id VARCHAR PRIMARY KEY,
    user_id VARCHAR NOT NULL,
    campaign_id VARCHAR NOT NULL,
    channel VARCHAR NOT NULL,
    ad_group VARCHAR NOT NULL,
    click_ts TIMESTAMPTZ NOT NULL,
    landing_page VARCHAR
);

CREATE TABLE conversions (
    conversion_id VARCHAR PRIMARY KEY,
    user_id VARCHAR NOT NULL,
    conversion_type VARCHAR NOT NULL,
    revenue DOUBLE PRECISION NOT NULL,
    conversion_ts TIMESTAMPTZ NOT NULL
);

The ad_clicks table captures every ad interaction: which user clicked, on which campaign and channel, and when. The conversions table records business outcomes: purchases, signups, or any other conversion event you care about.

Loading Sample Data

Let's populate the tables with a realistic multi-touch scenario. Four users interact with ads across multiple channels before converting:

INSERT INTO ad_clicks (click_id, user_id, campaign_id, channel, ad_group, click_ts, landing_page) VALUES
('clk_001', 'user_101', 'camp_spring', 'google_ads', 'brand_keywords', '2026-03-30 10:00:00+00', '/landing/spring-sale'),
('clk_002', 'user_101', 'camp_spring', 'facebook', 'retargeting', '2026-03-30 14:30:00+00', '/landing/spring-sale'),
('clk_003', 'user_101', 'camp_spring', 'email', 'newsletter', '2026-03-31 09:00:00+00', '/landing/spring-sale'),
('clk_004', 'user_202', 'camp_saas', 'google_ads', 'competitor_keywords', '2026-03-29 08:00:00+00', '/landing/free-trial'),
('clk_005', 'user_202', 'camp_saas', 'linkedin', 'decision_makers', '2026-03-30 11:00:00+00', '/landing/demo'),
('clk_006', 'user_303', 'camp_brand', 'tiktok', 'awareness', '2026-03-31 16:00:00+00', '/landing/brand'),
('clk_007', 'user_303', 'camp_brand', 'google_ads', 'brand_keywords', '2026-03-31 20:00:00+00', '/landing/brand'),
('clk_008', 'user_303', 'camp_brand', 'email', 'promo_blast', '2026-04-01 06:00:00+00', '/landing/brand'),
('clk_009', 'user_404', 'camp_spring', 'facebook', 'lookalike', '2026-03-28 12:00:00+00', '/landing/spring-sale'),
('clk_010', 'user_404', 'camp_spring', 'google_ads', 'brand_keywords', '2026-03-31 15:00:00+00', '/landing/spring-sale');

INSERT INTO conversions (conversion_id, user_id, conversion_type, revenue, conversion_ts) VALUES
('conv_001', 'user_101', 'purchase', 149.99, '2026-03-31 11:00:00+00'),
('conv_002', 'user_202', 'signup', 0.00, '2026-03-30 15:00:00+00'),
('conv_003', 'user_303', 'purchase', 299.50, '2026-04-01 08:00:00+00'),
('conv_004', 'user_404', 'purchase', 89.00, '2026-04-01 10:00:00+00');

User 101, for example, clicked a Google Ad, then a Facebook retargeting ad, then an email, before purchasing for $149.99. Each of those touchpoints played a role in the conversion. The question is: how much credit does each one deserve?

Building Multi-Touch Attribution with Materialized Views

Step 1: Join Clicks with Conversions

The foundation of attribution is joining click events with conversion events for the same user, within a defined attribution window. Here we use a 7-day window, meaning a click gets credit only if a conversion happens within 7 days of that click:

CREATE MATERIALIZED VIEW mv_click_conversions AS
SELECT
    c.click_id,
    c.user_id,
    c.campaign_id,
    c.channel,
    c.ad_group,
    c.click_ts,
    v.conversion_id,
    v.conversion_type,
    v.revenue,
    v.conversion_ts,
    v.conversion_ts - c.click_ts AS time_to_convert
FROM
    ad_clicks c
JOIN
    conversions v
ON
    c.user_id = v.user_id
WHERE
    c.click_ts <= v.conversion_ts
    AND v.conversion_ts - c.click_ts <= INTERVAL '7 days';

This materialized view maintains a live mapping of every click that contributed to a conversion. The WHERE clause enforces two rules: the click must happen before the conversion, and the gap must be within 7 days.

Querying this view:

SELECT click_id, user_id, channel, click_ts, conversion_id, revenue, time_to_convert
FROM mv_click_conversions
ORDER BY user_id, click_ts;
 click_id | user_id  |  channel   |         click_ts          | conversion_id | revenue | time_to_convert
----------+----------+------------+---------------------------+---------------+---------+-----------------
 clk_001  | user_101 | google_ads | 2026-03-30 10:00:00+00:00 | conv_001      |  149.99 | 1 day 01:00:00
 clk_002  | user_101 | facebook   | 2026-03-30 14:30:00+00:00 | conv_001      |  149.99 | 20:30:00
 clk_003  | user_101 | email      | 2026-03-31 09:00:00+00:00 | conv_001      |  149.99 | 02:00:00
 clk_004  | user_202 | google_ads | 2026-03-29 08:00:00+00:00 | conv_002      |       0 | 1 day 07:00:00
 clk_005  | user_202 | linkedin   | 2026-03-30 11:00:00+00:00 | conv_002      |       0 | 04:00:00
 clk_006  | user_303 | tiktok     | 2026-03-31 16:00:00+00:00 | conv_003      |   299.5 | 16:00:00
 clk_007  | user_303 | google_ads | 2026-03-31 20:00:00+00:00 | conv_003      |   299.5 | 12:00:00
 clk_008  | user_303 | email      | 2026-04-01 06:00:00+00:00 | conv_003      |   299.5 | 02:00:00
 clk_009  | user_404 | facebook   | 2026-03-28 12:00:00+00:00 | conv_004      |      89 | 3 days 22:00:00
 clk_010  | user_404 | google_ads | 2026-03-31 15:00:00+00:00 | conv_004      |      89 | 19:00:00
(10 rows)

You can see each user's full click path leading to their conversion. User 303 had three touchpoints (TikTok, Google Ads, email) before a $299.50 purchase.

Step 2: Compute Linear Attribution Scores

Linear attribution divides the conversion revenue equally among all touchpoints. If a user clicked three ads before converting, each ad gets one-third of the revenue. This is the fairest starting point for multi-touch attribution:

CREATE MATERIALIZED VIEW mv_attribution_scores AS
SELECT
    cc.click_id,
    cc.user_id,
    cc.campaign_id,
    cc.channel,
    cc.ad_group,
    cc.conversion_id,
    cc.revenue,
    cc.click_ts,
    cc.conversion_ts,
    tc.touch_count,
    cc.revenue / tc.touch_count::DOUBLE PRECISION AS attributed_revenue
FROM
    mv_click_conversions cc
JOIN (
    SELECT
        conversion_id,
        COUNT(*) AS touch_count
    FROM
        mv_click_conversions
    GROUP BY
        conversion_id
) tc
ON cc.conversion_id = tc.conversion_id;

This view builds on top of mv_click_conversions. It counts how many touchpoints each conversion had, then divides the revenue equally. Because RisingWave maintains materialized views incrementally, this entire chain updates automatically when new data arrives.

SELECT click_id, user_id, channel, conversion_id, revenue,
       touch_count, ROUND(attributed_revenue::numeric, 2) AS attributed_revenue
FROM mv_attribution_scores
ORDER BY user_id, click_ts;
 click_id | user_id  |  channel   | conversion_id | revenue | touch_count | attributed_revenue
----------+----------+------------+---------------+---------+-------------+--------------------
 clk_001  | user_101 | google_ads | conv_001      |  149.99 |           3 |              50.00
 clk_002  | user_101 | facebook   | conv_001      |  149.99 |           3 |              50.00
 clk_003  | user_101 | email      | conv_001      |  149.99 |           3 |              50.00
 clk_004  | user_202 | google_ads | conv_002      |       0 |           2 |                  0
 clk_005  | user_202 | linkedin   | conv_002      |       0 |           2 |                  0
 clk_006  | user_303 | tiktok     | conv_003      |   299.5 |           3 |              99.83
 clk_007  | user_303 | google_ads | conv_003      |   299.5 |           3 |              99.83
 clk_008  | user_303 | email      | conv_003      |   299.5 |           3 |              99.83
 clk_009  | user_404 | facebook   | conv_004      |      89 |           2 |               44.5
 clk_010  | user_404 | google_ads | conv_004      |      89 |           2 |               44.5
(10 rows)

User 101's $149.99 purchase is split evenly: $50.00 each for Google Ads, Facebook, and email. This happens continuously, not in a nightly batch.

Step 3: Position-Based Attribution (First-Touch and Last-Touch)

Different attribution models tell different stories. First-touch attribution credits the channel that introduced the user. Last-touch credits the channel that closed the deal. Comparing the two reveals which channels drive awareness versus which drive conversions:

CREATE MATERIALIZED VIEW mv_position_attribution AS
SELECT
    cc.click_id,
    cc.user_id,
    cc.channel,
    cc.conversion_id,
    cc.revenue,
    cc.click_ts,
    cc.conversion_ts,
    CASE WHEN cc.click_ts = ft.first_click_ts THEN cc.revenue ELSE 0 END AS first_touch_revenue,
    CASE WHEN cc.click_ts = lt.last_click_ts THEN cc.revenue ELSE 0 END AS last_touch_revenue
FROM
    mv_click_conversions cc
JOIN (
    SELECT conversion_id, MIN(click_ts) AS first_click_ts
    FROM mv_click_conversions
    GROUP BY conversion_id
) ft ON cc.conversion_id = ft.conversion_id
JOIN (
    SELECT conversion_id, MAX(click_ts) AS last_click_ts
    FROM mv_click_conversions
    GROUP BY conversion_id
) lt ON cc.conversion_id = lt.conversion_id;
SELECT click_id, user_id, channel, conversion_id, revenue,
       ROUND(first_touch_revenue::numeric, 2) AS first_touch,
       ROUND(last_touch_revenue::numeric, 2) AS last_touch
FROM mv_position_attribution
ORDER BY user_id, click_ts;
 click_id | user_id  |  channel   | conversion_id | revenue | first_touch | last_touch
----------+----------+------------+---------------+---------+-------------+------------
 clk_001  | user_101 | google_ads | conv_001      |  149.99 |      149.99 |          0
 clk_002  | user_101 | facebook   | conv_001      |  149.99 |           0 |          0
 clk_003  | user_101 | email      | conv_001      |  149.99 |           0 |     149.99
 clk_004  | user_202 | google_ads | conv_002      |       0 |           0 |          0
 clk_005  | user_202 | linkedin   | conv_002      |       0 |           0 |          0
 clk_006  | user_303 | tiktok     | conv_003      |   299.5 |       299.5 |          0
 clk_007  | user_303 | google_ads | conv_003      |   299.5 |           0 |          0
 clk_008  | user_303 | email      | conv_003      |   299.5 |           0 |      299.5
 clk_009  | user_404 | facebook   | conv_004      |      89 |          89 |          0
 clk_010  | user_404 | google_ads | conv_004      |      89 |           0 |         89
(10 rows)

Notice the story this tells. For user 303, TikTok gets all the first-touch credit ($299.50) because it introduced the user, while email gets all the last-touch credit because it was the final interaction before purchase. Google Ads, which sat in the middle, gets zero under both models, but received $99.83 under linear attribution.

Step 4: Aggregate by Channel

Roll up attribution scores to the channel level for executive dashboards and budget allocation decisions:

CREATE MATERIALIZED VIEW mv_channel_attribution AS
SELECT
    channel,
    COUNT(DISTINCT conversion_id) AS conversions_assisted,
    COUNT(*) AS total_touchpoints,
    SUM(attributed_revenue) AS total_attributed_revenue,
    AVG(attributed_revenue) AS avg_attributed_revenue
FROM
    mv_attribution_scores
GROUP BY
    channel;
SELECT channel, conversions_assisted, total_touchpoints,
       ROUND(total_attributed_revenue::numeric, 2) AS total_attributed_revenue,
       ROUND(avg_attributed_revenue::numeric, 2) AS avg_attributed_revenue
FROM mv_channel_attribution
ORDER BY total_attributed_revenue DESC;
  channel   | conversions_assisted | total_touchpoints | total_attributed_revenue | avg_attributed_revenue
------------+----------------------+-------------------+--------------------------+------------------------
 google_ads |                    4 |                 4 |                   194.33 |                  48.58
 email      |                    2 |                 2 |                   149.83 |                  74.92
 tiktok     |                    1 |                 1 |                    99.83 |                  99.83
 facebook   |                    2 |                 2 |                    94.50 |                  47.25
 linkedin   |                    1 |                 1 |                        0 |                      0
(5 rows)

Google Ads leads in total attributed revenue ($194.33) because it appears in four conversion paths. But email has the highest average attributed revenue per touchpoint ($74.92), suggesting it is highly efficient at converting users who are already in the funnel.

Comparing Attribution Models Side by Side

To make budget decisions, you need to see how different models assign credit to the same channels. Combine the linear and position-based results:

SELECT
    la.channel,
    ROUND(la.total_attributed_revenue::numeric, 2) AS linear_revenue,
    ROUND(SUM(pa.first_touch_revenue)::numeric, 2) AS first_touch_revenue,
    ROUND(SUM(pa.last_touch_revenue)::numeric, 2) AS last_touch_revenue
FROM
    mv_channel_attribution la
JOIN
    mv_position_attribution pa ON la.channel = pa.channel
GROUP BY
    la.channel, la.total_attributed_revenue
ORDER BY
    linear_revenue DESC;
  channel   | linear_revenue | first_touch_revenue | last_touch_revenue
------------+----------------+---------------------+--------------------
 google_ads |         194.33 |              149.99 |                 89
 email      |         149.83 |                   0 |             449.49
 tiktok     |          99.83 |               299.5 |                  0
 facebook   |          94.50 |                  89 |                  0
 linkedin   |              0 |                   0 |                  0
(5 rows)

This comparison reveals critical insights:

  • Email dominates last-touch revenue ($449.49) but has zero first-touch credit. It is a closer, not an opener. Cutting email spend because it "doesn't generate traffic" would be a mistake.
  • TikTok shows the opposite pattern: strong first-touch ($299.50), zero last-touch. It drives awareness but doesn't close deals. Evaluating TikTok on last-click ROAS would dramatically undervalue it.
  • Google Ads is balanced across models, serving as both an entry point and a conversion driver.

These insights update in real time as new clicks and conversions arrive, rather than being discovered in a weekly analytics review.

From Batch to Stream: Why Streaming SQL Wins

Traditional attribution runs as a batch job, typically in Spark, dbt, or Airflow. You might schedule a nightly pipeline that:

  1. Extracts click logs from ad platform APIs
  2. Extracts conversion data from your transactional database
  3. Joins them and computes attribution in a Spark job
  4. Writes results to a data warehouse table
  5. Refreshes a BI dashboard

This approach has three problems:

ProblemBatch AttributionStreaming Attribution (RisingWave)
LatencyHours (nightly job)Seconds (incremental updates)
Wasted spendDetected next morningDetected immediately
Compute costRe-processes full datasetOnly processes new/changed data
InfrastructureSpark cluster + orchestrator + warehouseSingle streaming database

With RisingWave's materialized views, the attribution computation is always current. When a new conversion event arrives, the join with prior clicks and the revenue split are computed within seconds. If a campaign suddenly stops converting, your team can see it and react within minutes, not the next morning.

In production, you would connect RisingWave directly to your Kafka topics using CREATE SOURCE:

CREATE SOURCE ad_clicks_source (
    click_id VARCHAR,
    user_id VARCHAR,
    campaign_id VARCHAR,
    channel VARCHAR,
    ad_group VARCHAR,
    click_ts TIMESTAMPTZ,
    landing_page VARCHAR
) WITH (
    connector = 'kafka',
    topic = 'ad-clicks',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

The materialized views built on top of these sources update automatically as events arrive. No cron jobs, no orchestration, no stale data.

What Is Multi-Touch Ad Attribution in Streaming SQL?

Multi-touch ad attribution in streaming SQL is a technique for assigning conversion credit to multiple advertising touchpoints using continuously updated SQL materialized views. Instead of running batch jobs that analyze click and conversion data overnight, a streaming database like RisingWave joins click events with conversion events as they arrive, computes attribution scores in real time, and keeps results fresh with incremental updates. This approach lets marketing teams see which channels, campaigns, and ads contribute to conversions within seconds of the event occurring.

How Does a 7-Day Attribution Window Work in Practice?

A 7-day attribution window means that a click receives credit for a conversion only if the conversion happens within 7 days after the click. In streaming SQL, you enforce this by adding a condition like v.conversion_ts - c.click_ts <= INTERVAL '7 days' to the join between clicks and conversions. The window is configurable: e-commerce businesses often use 7 to 30 days, while SaaS companies with longer sales cycles might extend it to 60 or 90 days. Clicks outside the window are excluded from attribution, preventing stale interactions from inflating channel performance.

How Does Linear Attribution Compare to First-Touch and Last-Touch Models?

Linear attribution divides conversion revenue equally among all touchpoints in the customer journey. First-touch gives 100% credit to the initial interaction, while last-touch gives 100% credit to the final interaction before conversion. Linear attribution provides the most balanced view of channel performance, but first-touch and last-touch models are useful for understanding which channels drive awareness versus which channels close deals. In practice, marketing teams benefit from running all three models side by side, which is straightforward when each model is a separate materialized view in RisingWave.

Can Streaming Attribution Handle High-Volume Ad Data?

Yes. RisingWave is designed for high-throughput streaming workloads. Materialized views process only the incremental changes (new clicks and new conversions), not the full dataset. This means attribution computations scale with the rate of incoming events, not the total volume of historical data. For ad-tech workloads with millions of clicks per day, you can scale RisingWave horizontally across multiple nodes. The SQL interface stays the same regardless of scale, so you don't need to rewrite attribution logic as your data grows.

Conclusion

Real-time ad click attribution with streaming SQL gives marketing teams the visibility they need to optimize spend as campaigns run, not the morning after. Here are the key takeaways:

  • Materialized views chain naturally for attribution: join clicks to conversions, compute per-touchpoint scores, aggregate by channel, all as incrementally maintained views.
  • Multiple attribution models (linear, first-touch, last-touch) can run simultaneously as separate materialized views on the same data, enabling side-by-side comparison.
  • The 7-day attribution window is enforced directly in SQL with interval arithmetic, and is easy to adjust for your business's sales cycle.
  • Streaming beats batch for attribution because stale data means wasted ad spend. Every hour of latency is budget burned on underperforming channels.
  • Standard SQL means your existing analytics team can build and maintain attribution pipelines without learning a new programming model.

Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart →

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.