Real-Time Ad Click Fraud Detection with Streaming SQL

Real-time ad click fraud detection uses streaming SQL to analyze click events as they arrive, flagging anomalies—such as abnormally high click rates from a single IP or device—within seconds. RisingWave's materialized views continuously update fraud scores without batch delays, enabling ad platforms to block fraudulent traffic before budget is wasted.

Why Click Fraud Costs Advertisers Billions

Ad click fraud drains an estimated $35–$50 billion from digital advertising budgets every year. Bots, click farms, and competitor sabotage generate fake clicks that inflate costs and distort campaign performance data. Traditional batch detection systems catch fraud hours or days after the damage is done. By then, budgets are depleted and attribution models are corrupted.

Streaming SQL changes this equation. By processing every click event in real time, you can detect fraud patterns the moment they emerge—not after the fact.

Architecture Overview

A fraud detection pipeline built on RisingWave ingests click events from Kafka, joins them against reference data, and maintains continuously updated fraud scores in materialized views. Downstream systems (ad servers, DSPs) query these views to block suspicious traffic in milliseconds.

The pipeline has three layers:

Ingestion: Kafka topics carry raw click events from ad serving infrastructure
Detection: Materialized views compute rolling fraud signals per IP, device, and campaign
Action: Sinks push fraud scores to downstream enforcement systems

Setting Up the Streaming Pipeline

First, connect RisingWave to your Kafka click stream:

CREATE SOURCE click_events (
    click_id        VARCHAR,
    campaign_id     VARCHAR,
    publisher_id    VARCHAR,
    ip_address      VARCHAR,
    user_agent      VARCHAR,
    device_id       VARCHAR,
    event_time      TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'ad-click-events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Next, create a reference table for known fraudulent IPs and thresholds:

CREATE TABLE fraud_ip_blocklist (
    ip_address      VARCHAR PRIMARY KEY,
    reason          VARCHAR,
    added_at        TIMESTAMPTZ
);

CREATE TABLE fraud_thresholds (
    threshold_name  VARCHAR PRIMARY KEY,
    threshold_value INTEGER
);

Detecting Anomalies with Materialized Views

The core detection logic uses tumbling windows to count clicks per IP in rolling 5-minute buckets. Any IP exceeding the threshold is flagged immediately:

CREATE MATERIALIZED VIEW ip_click_rate AS
SELECT
    ip_address,
    window_start,
    window_end,
    COUNT(*)                                AS click_count,
    COUNT(DISTINCT campaign_id)             AS campaigns_targeted,
    COUNT(DISTINCT device_id)               AS device_count,
    CASE
        WHEN COUNT(*) > 100 THEN 'HIGH'
        WHEN COUNT(*) > 50  THEN 'MEDIUM'
        ELSE 'LOW'
    END                                     AS fraud_risk_level
FROM TUMBLE(click_events, event_time, INTERVAL '5 MINUTES')
GROUP BY ip_address, window_start, window_end;

For more sophisticated detection, use hopping windows to catch click farms that spread traffic across overlapping time windows:

CREATE MATERIALIZED VIEW device_click_velocity AS
SELECT
    device_id,
    window_start,
    window_end,
    COUNT(*)                        AS click_count,
    COUNT(DISTINCT ip_address)      AS ip_count,
    COUNT(DISTINCT campaign_id)     AS campaign_count,
    AVG(COUNT(*)) OVER (
        PARTITION BY device_id
    )                               AS avg_clicks_per_window
FROM HOP(click_events, event_time, INTERVAL '1 MINUTE', INTERVAL '10 MINUTES')
GROUP BY device_id, window_start, window_end;

Fraud Scoring and Enforcement

Combine signals into a unified fraud score view and sink results to Kafka for downstream enforcement:

CREATE SINK fraud_scores_sink
FROM ip_click_rate
WITH (
    connector = 'kafka',
    topic = 'fraud-scores',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Comparison: Batch vs. Streaming Fraud Detection

Dimension	Batch Detection	Streaming Detection (RisingWave)
Detection latency	Hours to days	Seconds
Budget waste window	Large (full campaign flight)	Minimal (blocked in real time)
Infrastructure complexity	ETL pipelines + warehouses	Single streaming SQL layer
Fraud signal freshness	Stale by query time	Always current
Scalability	Limited by batch job size	Horizontally scalable
Operational overhead	High (job scheduling, backfills)	Low (always-on materialized views)

Deploying to Production

When running in production, consider these enhancements:

Multi-signal correlation: Join IP velocity data with device fingerprint data to reduce false positives. A high-click IP that correlates with a known legitimate user (logged-in account, conversion history) should be scored differently than an anonymous high-velocity IP.

Temporal joins for blocklist lookups: Use FOR SYSTEM_TIME AS OF to join click events against the blocklist at event time, ensuring point-in-time accuracy even when the blocklist is updated concurrently.

Session windows for behavioral analysis: SESSION() windows detect when a single device opens and closes suspicious click sessions, useful for catching manual click fraud operators who pause between bursts.

FAQ

Q: How fast does RisingWave detect fraud patterns? RisingWave processes events with sub-second latency. Materialized views update incrementally as each event arrives, so fraud signals are available within 1–3 seconds of the triggering click. The exact latency depends on window size and downstream sink configuration.

Q: Can RisingWave handle high-volume click streams from large ad networks? Yes. RisingWave is horizontally scalable and designed for high-throughput event streams. It has been deployed in production environments processing millions of events per second. For very high-volume deployments, increase the number of compute nodes and partition your Kafka topics accordingly.

Q: What happens when the blocklist is updated while events are being processed? Reference tables in RisingWave can be updated at any time. Materialized views that join against these tables will use the latest version of the table data for new computations. Use FOR SYSTEM_TIME AS OF if you need point-in-time semantics for historical replay.

Q: Does this replace my existing fraud detection vendor? RisingWave handles the real-time data pipeline and aggregation layer. It works alongside ML-based fraud scoring services—you can sink fraud features computed by RisingWave into your ML inference system, or join ML scores back into RisingWave for unified reporting.

Q: How do I tune the fraud thresholds to minimize false positives? Start with conservative thresholds and monitor false positive rates. RisingWave's FILTER WHERE clause lets you segment threshold logic by publisher, geography, or device type, enabling fine-grained tuning without rewriting the entire detection pipeline.

Get Started

Ready to build real-time fraud detection for your ad platform? RisingWave is free to try and works with your existing Kafka infrastructure.

Documentation: docs.risingwave.com/get-started
Community: risingwave.com/slack