How to Build a Real-Time Fraud Detection System with Streaming SQL

How to Build a Real-Time Fraud Detection System with Streaming SQL

Fraud costs the global economy over $48 billion per year. Batch-based detection systems that run every hour or every night give fraudsters a massive head start. By the time your nightly ETL flags a stolen card, the damage is done.

Real-time fraud detection with streaming SQL changes the equation. Instead of querying historical data after the fact, you define fraud rules as continuous queries that evaluate every transaction the moment it arrives. When a pattern matches, you get an alert in milliseconds, not hours.

In this guide, you will build a complete real-time fraud detection system using RisingWave, a streaming database that lets you write standard SQL to process streaming data. You will create sources to ingest transaction events, define materialized views that encode fraud detection rules, and set up sinks to push alerts to downstream systems. No Java, no Flink DAGs, no custom code: just SQL.

Why Streaming SQL for Fraud Detection?

Traditional fraud detection pipelines involve a patchwork of tools: a message broker like Apache Kafka for ingestion, a stream processing framework like Flink or Spark Streaming for logic, a database for storage, and a separate alerting layer. Each component adds operational complexity and latency.

Streaming SQL collapses these layers. A streaming database like RisingWave lets you:

  • Ingest events directly from Kafka, Kinesis, or Pulsar using CREATE SOURCE
  • Define detection rules as materialized views using standard SQL with windowing functions
  • Serve results instantly because materialized views are always up to date and queryable
  • Push alerts downstream using CREATE SINK to Kafka, PostgreSQL, or webhooks

The key advantage is incremental computation. When a new transaction arrives, RisingWave does not re-scan the entire dataset. It updates only the affected materialized views, keeping latency in the single-digit millisecond range. This is critical for fraud detection, where decisions must be made before the transaction authorization window closes (typically 100-200 milliseconds).

Batch vs. Streaming Detection: A Comparison

AspectBatch DetectionStreaming SQL Detection
LatencyMinutes to hoursMilliseconds
Data freshnessStale (last batch run)Always current
InfrastructureETL + warehouse + schedulerSingle streaming database
Rule changesRedeploy pipelineUpdate SQL, instant effect
BackfillFull table scanIncremental updates only

Setting Up the Transaction Data Pipeline

Every fraud detection system starts with a stream of transaction events. In production, this stream typically comes from Kafka. Here is how to create a source in RisingWave that connects to a Kafka topic.

Defining the Transaction Source

CREATE SOURCE transactions_stream (
    transaction_id VARCHAR,
    user_id VARCHAR,
    amount DECIMAL,
    currency VARCHAR,
    merchant_id VARCHAR,
    merchant_category VARCHAR,
    location VARCHAR,
    card_number VARCHAR,
    ts TIMESTAMP,
    ip_address VARCHAR
)
WITH (
    connector = 'kafka',
    topic = 'transactions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

This CREATE SOURCE statement tells RisingWave to continuously consume JSON-encoded transaction events from the transactions Kafka topic. Each event contains the fields you would expect: transaction ID, user, amount, merchant details, location, timestamp, and IP address.

For development and testing, you can use a table instead, which lets you insert data directly:

CREATE TABLE transactions (
    transaction_id VARCHAR,
    user_id VARCHAR,
    amount DECIMAL,
    currency VARCHAR,
    merchant_id VARCHAR,
    merchant_category VARCHAR,
    location VARCHAR,
    card_number VARCHAR,
    ts TIMESTAMP,
    ip_address VARCHAR
);

All the materialized views below work identically whether the upstream is a source or a table. The SQL is the same.

Building Fraud Detection Rules as Materialized Views

A materialized view in RisingWave is a continuously maintained query result. Unlike a traditional database view that re-executes the query each time you read it, a materialized view in RisingWave incrementally updates as new data arrives. This makes it ideal for fraud detection: you define the rule once, and every new transaction is evaluated against it automatically.

Rule 1: High-Value Transaction Alerts

The simplest fraud signal is a transaction that exceeds a threshold amount. While not all high-value transactions are fraudulent, they warrant immediate review.

CREATE MATERIALIZED VIEW fraud_high_value_alerts AS
SELECT
    transaction_id,
    user_id,
    amount,
    merchant_id,
    merchant_category,
    location,
    ts
FROM transactions
WHERE amount > 10000;

Query this view to see all flagged transactions:

SELECT transaction_id, user_id, amount, location
FROM fraud_high_value_alerts;
 transaction_id | user_id |  amount  | location
----------------+---------+----------+----------
 txn_001        | user_42 | 15000.00 | New York
(1 row)

This view is always current. The moment a $15,000 transaction hits the stream, it appears in the result set.

Rule 2: Velocity-Based Detection with Tumbling Windows

Fraudsters often make multiple purchases quickly after stealing a card. The TUMBLE window function in RisingWave lets you group transactions into fixed time intervals and flag users who exceed normal velocity.

CREATE MATERIALIZED VIEW fraud_velocity_alerts AS
SELECT
    user_id,
    window_start,
    window_end,
    COUNT(*) AS transaction_count,
    SUM(amount) AS total_amount,
    COUNT(DISTINCT merchant_id) AS unique_merchants
FROM TUMBLE(transactions, ts, INTERVAL '5 minutes')
GROUP BY user_id, window_start, window_end
HAVING COUNT(*) > 5 OR SUM(amount) > 20000;

This rule fires when a user makes more than 5 transactions in 5 minutes, or when their total spend in that window exceeds $20,000. Here is what the output looks like when a burst of 6 transactions totaling $24,000 occurs:

SELECT user_id, window_start, window_end,
       transaction_count, total_amount, unique_merchants
FROM fraud_velocity_alerts;
 user_id |    window_start     |     window_end      | transaction_count | total_amount | unique_merchants
---------+---------------------+---------------------+-------------------+--------------+------------------
 user_77 | 2026-04-01 14:00:00 | 2026-04-01 14:05:00 |                 6 |     24000.00 |                6
(1 row)

User user_77 triggered both conditions: 6 transactions and $24,000 total, all within a single 5-minute window.

Rule 3: Geographic Anomaly Detection

A card used in London, Paris, and Tokyo within the same hour is physically impossible. Geographic anomaly detection catches cloned cards being used simultaneously across different regions.

CREATE MATERIALIZED VIEW fraud_location_alerts AS
SELECT
    user_id,
    window_start,
    window_end,
    COUNT(DISTINCT location) AS distinct_locations,
    array_agg(DISTINCT location) AS locations,
    SUM(amount) AS total_amount,
    COUNT(*) AS transaction_count
FROM TUMBLE(transactions, ts, INTERVAL '1 hour')
GROUP BY user_id, window_start, window_end
HAVING COUNT(DISTINCT location) >= 3;
SELECT user_id, window_start, distinct_locations, locations, total_amount
FROM fraud_location_alerts;
 user_id |    window_start     | distinct_locations |      locations       | total_amount
---------+---------------------+--------------------+----------------------+--------------
 user_55 | 2026-04-01 09:00:00 |                  3 | {London,Paris,Tokyo} |      2600.00
(1 row)

Three distinct locations in one hour. This is a strong fraud indicator, and the locations array gives investigators immediate context.

Rule 4: Card Testing Detection (Rapid Micro-Transactions)

Before making large purchases, fraudsters often test stolen card numbers with a series of small transactions (often under $5) to verify the card is active. This pattern, known as card testing, is detectable by looking for clusters of very small transactions in a short window.

CREATE MATERIALIZED VIEW fraud_rapid_succession_alerts AS
SELECT
    card_number,
    window_start,
    window_end,
    COUNT(*) AS transaction_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount,
    MIN(amount) AS min_amount,
    MAX(amount) AS max_amount
FROM TUMBLE(transactions, ts, INTERVAL '2 minutes')
GROUP BY card_number, window_start, window_end
HAVING COUNT(*) >= 3 AND AVG(amount) < 5.00;
SELECT card_number, transaction_count, total_amount, avg_amount
FROM fraud_rapid_succession_alerts;
 card_number | transaction_count | total_amount | avg_amount
-------------+-------------------+--------------+------------
 card_4001   |                 4 |         5.00 |       1.25
(1 row)

Four transactions averaging $1.25 each in two minutes. This is a textbook card testing pattern.

Creating a Unified Fraud Dashboard

With multiple detection rules running independently, you need a single view that consolidates all alerts. This unified dashboard lets analysts see every active alert in one place, sorted by time.

CREATE MATERIALIZED VIEW fraud_dashboard AS
SELECT
    'high_value' AS alert_type,
    transaction_id AS alert_id,
    user_id,
    amount,
    location AS detail,
    ts AS detected_at
FROM transactions
WHERE amount > 10000

UNION ALL

SELECT
    'velocity' AS alert_type,
    user_id || '_' || window_start::VARCHAR AS alert_id,
    user_id,
    total_amount AS amount,
    transaction_count::VARCHAR || ' txns in 5min' AS detail,
    window_start AS detected_at
FROM fraud_velocity_alerts

UNION ALL

SELECT
    'geo_anomaly' AS alert_type,
    user_id || '_' || window_start::VARCHAR AS alert_id,
    user_id,
    total_amount AS amount,
    distinct_locations::VARCHAR || ' locations in 1hr' AS detail,
    window_start AS detected_at
FROM fraud_location_alerts;
SELECT * FROM fraud_dashboard ORDER BY detected_at;
 alert_type  |          alert_id           | user_id |  amount  |       detail       |     detected_at
-------------+-----------------------------+---------+----------+--------------------+---------------------
 geo_anomaly | user_55_2026-04-01 09:00:00 | user_55 |  2600.00 | 3 locations in 1hr | 2026-04-01 09:00:00
 high_value  | txn_001                     | user_42 | 15000.00 | New York           | 2026-04-01 10:00:00
 velocity    | user_77_2026-04-01 14:00:00 | user_77 | 24000.00 | 6 txns in 5min     | 2026-04-01 14:00:00
(3 rows)

All three alert types appear in a single, queryable view. As new transactions flow in, this dashboard updates automatically.

Pushing Alerts to Downstream Systems

Detection is only useful if it triggers action. RisingWave's CREATE SINK lets you push fraud alerts to external systems in real time. Here is how to send alerts to a Kafka topic for consumption by your notification service, case management system, or ML pipeline.

CREATE SINK fraud_alerts_sink
FROM fraud_dashboard
WITH (
    connector = 'kafka',
    topic = 'fraud-alerts',
    properties.bootstrap.server = 'kafka:9092',
    primary_key = 'alert_id'
)
FORMAT UPSERT ENCODE JSON;

Every time a new alert appears in fraud_dashboard, RisingWave pushes it to the fraud-alerts Kafka topic as a JSON message. Downstream consumers can then trigger SMS notifications, block transactions, or create investigation tickets.

You can also sink directly to PostgreSQL for integration with existing operational databases:

CREATE SINK fraud_alerts_pg
FROM fraud_dashboard
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://postgres:5432/fraud_db',
    table.name = 'fraud_alerts',
    type = 'upsert',
    primary_key = 'alert_id'
);

Architecture Overview

Here is the complete data flow for this fraud detection system:

graph LR
    A[Payment Gateway] -->|Transaction Events| B[Apache Kafka]
    B -->|CREATE SOURCE| C[RisingWave]
    C -->|Materialized Views| D[Fraud Rules Engine]
    D -->|CREATE SINK| E[Kafka: fraud-alerts]
    D -->|CREATE SINK| F[PostgreSQL]
    E --> G[Notification Service]
    E --> H[Case Management]
    F --> I[Analyst Dashboard]

The entire fraud detection logic lives inside RisingWave as SQL. There is no application code to deploy, no DAGs to manage, and no cluster to tune. Adding a new detection rule means writing a new CREATE MATERIALIZED VIEW statement.

Tuning for Production

When moving from development to production, consider these optimizations:

  • Window size tuning: Start with wider windows (e.g., 10 minutes) and tighten based on false positive rates. Too narrow catches less; too wide generates noise.
  • Threshold calibration: Use historical data to set amount and count thresholds. The $10,000 and 5-transaction limits used here are starting points.
  • Layered detection: Combine multiple signals. A high-value transaction from a new location is more suspicious than either signal alone. The fraud_dashboard view enables this kind of correlation.
  • Source rate limiting: For high-throughput streams, use RisingWave's source_rate_limit parameter to control backfill speed without dropping events.
  • Monitoring: Query rw_catalog.rw_materialized_views to monitor view freshness and processing lag.

What Is Real-Time Fraud Detection with Streaming SQL?

Real-time fraud detection with streaming SQL is a technique where fraud rules are expressed as continuous SQL queries over a stream of transaction events. Unlike batch processing, which analyzes transactions after they have been collected into a data warehouse, streaming SQL evaluates each transaction the instant it arrives. A streaming database like RisingWave maintains the results of these queries as materialized views, delivering sub-second detection latency without requiring custom application code.

How Does Streaming SQL Compare to Traditional Fraud Detection Approaches?

Traditional approaches rely on either batch ETL pipelines or custom stream processing applications written in Java or Scala (typically using Apache Flink or Kafka Streams). Streaming SQL replaces both with standard SQL that runs continuously. The main advantages are faster development cycles (SQL is more accessible than Java), lower operational overhead (no JVM clusters to manage), and the ability to query detection results directly without a separate serving layer. The tradeoff is that highly complex ML-based detection still requires integration with external model serving, though streaming SQL handles the feature engineering and rule-based layers effectively.

Use RisingWave when your team's strength is SQL rather than Java, when you want queryable intermediate results (materialized views), and when you need to iterate quickly on detection rules without redeploying a pipeline. RisingWave is a strong fit for teams that need sub-second latency on rule-based and statistical detection patterns. If your fraud detection relies heavily on custom ML models with complex state management in Java, Flink may still be a better fit for the model-serving layer, though RisingWave can handle the upstream feature computation.

Can Streaming SQL Handle the Scale of Production Fraud Detection?

Yes. RisingWave is designed for production-scale streaming workloads. It uses a distributed architecture with separate compute and storage layers, supports horizontal scaling, and handles millions of events per second. For fraud detection specifically, the incremental computation model means that processing cost scales with the rate of incoming transactions, not with the total data volume. A financial services firm used RisingWave to build a feature store for fraud detection in production, processing high-throughput transaction streams with consistent low latency.

Conclusion

Building a real-time fraud detection system with streaming SQL is straightforward with the right tool. In this guide, you:

  • Created a streaming data source to ingest transaction events from Kafka
  • Built four materialized views covering high-value transactions, velocity anomalies, geographic impossibilities, and card testing patterns
  • Combined all alerts into a unified dashboard view
  • Set up sinks to push alerts to Kafka and PostgreSQL for downstream action

Every SQL statement in this post was tested against RisingWave 2.8.0 and produces the outputs shown. The complete system requires zero application code: every detection rule, aggregation, and output is pure SQL.

The real power of this approach is iteration speed. When a new fraud pattern emerges, you write a new CREATE MATERIALIZED VIEW and it starts detecting immediately. No redeployment, no downtime, no build pipeline.


Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.