Real-time ad click fraud detection uses streaming SQL to analyze click events as they arrive, flagging anomalies—such as abnormally high click rates from a single IP or device—within seconds. RisingWave's materialized views continuously update fraud scores without batch delays, enabling ad platforms to block fraudulent traffic before budget is wasted.
Why Click Fraud Costs Advertisers Billions
Ad click fraud drains an estimated $35–$50 billion from digital advertising budgets every year. Bots, click farms, and competitor sabotage generate fake clicks that inflate costs and distort campaign performance data. Traditional batch detection systems catch fraud hours or days after the damage is done. By then, budgets are depleted and attribution models are corrupted.
Streaming SQL changes this equation. By processing every click event in real time, you can detect fraud patterns the moment they emerge—not after the fact.
Architecture Overview
A fraud detection pipeline built on RisingWave ingests click events from Kafka, joins them against reference data, and maintains continuously updated fraud scores in materialized views. Downstream systems (ad servers, DSPs) query these views to block suspicious traffic in milliseconds.
The pipeline has three layers:
- Ingestion: Kafka topics carry raw click events from ad serving infrastructure
- Detection: Materialized views compute rolling fraud signals per IP, device, and campaign
- Action: Sinks push fraud scores to downstream enforcement systems
Setting Up the Streaming Pipeline
First, connect RisingWave to your Kafka click stream:
CREATE SOURCE click_events (
click_id VARCHAR,
campaign_id VARCHAR,
publisher_id VARCHAR,
ip_address VARCHAR,
user_agent VARCHAR,
device_id VARCHAR,
event_time TIMESTAMPTZ
)
WITH (
connector = 'kafka',
topic = 'ad-click-events',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;
Next, create a reference table for known fraudulent IPs and thresholds:
CREATE TABLE fraud_ip_blocklist (
ip_address VARCHAR PRIMARY KEY,
reason VARCHAR,
added_at TIMESTAMPTZ
);
CREATE TABLE fraud_thresholds (
threshold_name VARCHAR PRIMARY KEY,
threshold_value INTEGER
);
Detecting Anomalies with Materialized Views
The core detection logic uses tumbling windows to count clicks per IP in rolling 5-minute buckets. Any IP exceeding the threshold is flagged immediately:
CREATE MATERIALIZED VIEW ip_click_rate AS
SELECT
ip_address,
window_start,
window_end,
COUNT(*) AS click_count,
COUNT(DISTINCT campaign_id) AS campaigns_targeted,
COUNT(DISTINCT device_id) AS device_count,
CASE
WHEN COUNT(*) > 100 THEN 'HIGH'
WHEN COUNT(*) > 50 THEN 'MEDIUM'
ELSE 'LOW'
END AS fraud_risk_level
FROM TUMBLE(click_events, event_time, INTERVAL '5 MINUTES')
GROUP BY ip_address, window_start, window_end;
For more sophisticated detection, use hopping windows to catch click farms that spread traffic across overlapping time windows:
CREATE MATERIALIZED VIEW device_click_velocity AS
SELECT
device_id,
window_start,
window_end,
COUNT(*) AS click_count,
COUNT(DISTINCT ip_address) AS ip_count,
COUNT(DISTINCT campaign_id) AS campaign_count,
AVG(COUNT(*)) OVER (
PARTITION BY device_id
) AS avg_clicks_per_window
FROM HOP(click_events, event_time, INTERVAL '1 MINUTE', INTERVAL '10 MINUTES')
GROUP BY device_id, window_start, window_end;
Fraud Scoring and Enforcement
Combine signals into a unified fraud score view and sink results to Kafka for downstream enforcement:
CREATE SINK fraud_scores_sink
FROM ip_click_rate
WITH (
connector = 'kafka',
topic = 'fraud-scores',
properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;
Comparison: Batch vs. Streaming Fraud Detection
| Dimension | Batch Detection | Streaming Detection (RisingWave) |
| Detection latency | Hours to days | Seconds |
| Budget waste window | Large (full campaign flight) | Minimal (blocked in real time) |
| Infrastructure complexity | ETL pipelines + warehouses | Single streaming SQL layer |
| Fraud signal freshness | Stale by query time | Always current |
| Scalability | Limited by batch job size | Horizontally scalable |
| Operational overhead | High (job scheduling, backfills) | Low (always-on materialized views) |
Deploying to Production
When running in production, consider these enhancements:
Multi-signal correlation: Join IP velocity data with device fingerprint data to reduce false positives. A high-click IP that correlates with a known legitimate user (logged-in account, conversion history) should be scored differently than an anonymous high-velocity IP.
Temporal joins for blocklist lookups: Use FOR SYSTEM_TIME AS OF to join click events against the blocklist at event time, ensuring point-in-time accuracy even when the blocklist is updated concurrently.
Session windows for behavioral analysis: SESSION() windows detect when a single device opens and closes suspicious click sessions, useful for catching manual click fraud operators who pause between bursts.
FAQ
Q: How fast does RisingWave detect fraud patterns? RisingWave processes events with sub-second latency. Materialized views update incrementally as each event arrives, so fraud signals are available within 1–3 seconds of the triggering click. The exact latency depends on window size and downstream sink configuration.
Q: Can RisingWave handle high-volume click streams from large ad networks? Yes. RisingWave is horizontally scalable and designed for high-throughput event streams. It has been deployed in production environments processing millions of events per second. For very high-volume deployments, increase the number of compute nodes and partition your Kafka topics accordingly.
Q: What happens when the blocklist is updated while events are being processed?
Reference tables in RisingWave can be updated at any time. Materialized views that join against these tables will use the latest version of the table data for new computations. Use FOR SYSTEM_TIME AS OF if you need point-in-time semantics for historical replay.
Q: Does this replace my existing fraud detection vendor? RisingWave handles the real-time data pipeline and aggregation layer. It works alongside ML-based fraud scoring services—you can sink fraud features computed by RisingWave into your ML inference system, or join ML scores back into RisingWave for unified reporting.
Q: How do I tune the fraud thresholds to minimize false positives?
Start with conservative thresholds and monitor false positive rates. RisingWave's FILTER WHERE clause lets you segment threshold logic by publisher, geography, or device type, enabling fine-grained tuning without rewriting the entire detection pipeline.
Get Started
Ready to build real-time fraud detection for your ad platform? RisingWave is free to try and works with your existing Kafka infrastructure.
- Documentation: docs.risingwave.com/get-started
- Community: risingwave.com/slack

