Payment Fraud Detection with Streaming SQL: A Hands-On Tutorial

Introduction

A fraudster steals a credit card number and runs four transactions under $1.00 in three minutes to confirm the card is active. Five minutes later, a $900 purchase goes through. The entire attack takes less than ten minutes, but a batch fraud detection system running every six hours will not flag it until the money is long gone.

Payment fraud losses hit $34 billion globally in 2024, and card-not-present fraud continues to grow as more commerce moves online. The speed of the attack dictates the speed of the defense: you need a system that detects fraud patterns as transactions arrive, not hours or days later.

This tutorial walks you through building four real-time payment fraud detection patterns using streaming SQL in RisingWave, a streaming database that lets you write materialized views in standard SQL. Each pattern is verified against a running RisingWave instance (v2.8.0) with actual output included. By the end, you will have a composite risk scoring system that combines all four signals and automatically recommends block, review, or allow actions for every account.

Why Do Payment Fraud Systems Need Streaming?

Payment fraud detection is a race against time. A fraud detection system that processes events in real time has a window of opportunity to block a transaction before it settles, typically measured in seconds. Batch systems, by contrast, detect fraud after the fact, when chargebacks and losses are already locked in.

Streaming databases solve three specific problems that batch systems cannot:

Stateful pattern matching across events. Card testing involves a sequence of micro-transactions followed by a large purchase. Detecting this requires maintaining state across multiple events for the same card, something that a stateless API call per transaction cannot do.
Continuous aggregation. Geo-velocity detection requires counting distinct countries per card within a rolling time window. A streaming materialized view keeps this count updated incrementally as each new transaction arrives, without re-scanning the entire dataset.
Low latency at high throughput. Payment processors handle thousands of transactions per second. A streaming database like RisingWave processes and updates materialized views incrementally, maintaining sub-second latency even under heavy load.

Traditional approaches to real-time fraud detection require building custom applications in Java or Scala using frameworks like Apache Flink or Kafka Streams. Streaming SQL reduces this complexity: you express fraud detection logic as SQL queries, and the engine handles state management, incremental computation, and fault tolerance.

How Do You Set Up the Data Model?

Before building detection patterns, you need two tables: one for transaction events and one for account profile data. The transaction table captures the core attributes that fraud patterns rely on: amount, location, device, and timing. The account profile table provides baseline context like average spending behavior.

Transaction events table

CREATE TABLE transactions (
    tx_id VARCHAR PRIMARY KEY,
    card_id VARCHAR NOT NULL,
    account_id VARCHAR NOT NULL,
    amount DECIMAL NOT NULL,
    currency VARCHAR DEFAULT 'USD',
    merchant_category VARCHAR,
    merchant_country VARCHAR,
    channel VARCHAR,
    tx_timestamp TIMESTAMPTZ NOT NULL,
    ip_address VARCHAR,
    device_id VARCHAR
);

In a production system, this table would be replaced by a Kafka source ingesting transaction events from your payment gateway. For this tutorial, we use a table with direct inserts so you can follow along without setting up Kafka.

Account profiles table

CREATE TABLE account_profiles (
    account_id VARCHAR PRIMARY KEY,
    home_country VARCHAR NOT NULL,
    avg_daily_spend DECIMAL DEFAULT 0,
    account_age_days INT DEFAULT 0,
    is_verified BOOLEAN DEFAULT false
);

Load the sample data

Insert account profiles that represent different customer types:

INSERT INTO account_profiles VALUES
    ('acct_1001', 'US', 150.00, 730, true),
    ('acct_1002', 'US', 85.00, 45, true),
    ('acct_1003', 'GB', 200.00, 1095, true),
    ('acct_1004', 'DE', 120.00, 10, false);

Now insert transaction events that contain several embedded fraud patterns:

INSERT INTO transactions VALUES
    -- Card testing pattern: four micro-transactions under $1
    ('tx_001', 'card_5001', 'acct_1001', 0.50, 'USD', 'online_retail', 'US', 'web',
     '2025-03-15 14:00:00+00', '203.0.113.10', 'dev_a1'),
    ('tx_002', 'card_5001', 'acct_1001', 0.75, 'USD', 'online_retail', 'US', 'web',
     '2025-03-15 14:01:00+00', '203.0.113.10', 'dev_a1'),
    ('tx_003', 'card_5001', 'acct_1001', 0.30, 'USD', 'online_retail', 'US', 'web',
     '2025-03-15 14:02:00+00', '203.0.113.10', 'dev_a1'),
    ('tx_004', 'card_5001', 'acct_1001', 0.60, 'USD', 'digital_goods', 'US', 'web',
     '2025-03-15 14:03:00+00', '203.0.113.10', 'dev_a1'),
    -- Followed by a large purchase
    ('tx_005', 'card_5001', 'acct_1001', 899.99, 'USD', 'electronics', 'US', 'web',
     '2025-03-15 14:08:00+00', '203.0.113.10', 'dev_a1'),

    -- Geo-velocity pattern: three countries in one hour, multiple devices
    ('tx_006', 'card_5002', 'acct_1002', 45.00, 'USD', 'grocery', 'US', 'pos',
     '2025-03-15 10:00:00+00', NULL, 'dev_b1'),
    ('tx_007', 'card_5002', 'acct_1002', 1200.00, 'USD', 'electronics', 'GB', 'web',
     '2025-03-15 10:30:00+00', '198.51.100.5', 'dev_c1'),
    ('tx_008', 'card_5002', 'acct_1002', 850.00, 'USD', 'jewelry', 'JP', 'web',
     '2025-03-15 11:00:00+00', '192.0.2.99', 'dev_d1'),

    -- High-value wire transfers during off-hours
    ('tx_009', 'card_5003', 'acct_1003', 5500.00, 'GBP', 'wire_transfer', 'NG', 'mobile',
     '2025-03-15 03:15:00+00', '198.51.100.20', 'dev_e1'),
    ('tx_010', 'card_5003', 'acct_1003', 4800.00, 'GBP', 'wire_transfer', 'NG', 'mobile',
     '2025-03-15 03:20:00+00', '198.51.100.20', 'dev_e1'),

    -- Normal-looking activity from a new account
    ('tx_011', 'card_5004', 'acct_1004', 50.00, 'EUR', 'online_retail', 'DE', 'web',
     '2025-03-15 12:00:00+00', '203.0.113.50', 'dev_f1'),
    ('tx_012', 'card_5004', 'acct_1004', 55.00, 'EUR', 'online_retail', 'DE', 'web',
     '2025-03-15 12:05:00+00', '203.0.113.50', 'dev_f1'),
    ('tx_013', 'card_5004', 'acct_1004', 48.00, 'EUR', 'digital_goods', 'DE', 'web',
     '2025-03-15 12:10:00+00', '203.0.113.50', 'dev_f1'),
    ('tx_014', 'card_5004', 'acct_1004', 52.00, 'EUR', 'online_retail', 'DE', 'web',
     '2025-03-15 12:15:00+00', '203.0.113.50', 'dev_f1'),
    ('tx_015', 'card_5004', 'acct_1004', 60.00, 'EUR', 'online_retail', 'DE', 'web',
     '2025-03-15 12:20:00+00', '203.0.113.50', 'dev_f1'),
    ('tx_016', 'card_5004', 'acct_1004', 45.00, 'EUR', 'digital_goods', 'DE', 'web',
     '2025-03-15 12:25:00+00', '203.0.113.50', 'dev_f1');

This dataset embeds four distinct fraud patterns across four accounts. Let's build detectors for each.

How Do You Detect Card Testing Attacks?

Card testing is one of the most common payment fraud patterns. A fraudster obtains stolen card numbers (often from data breaches) and runs a series of tiny transactions, usually under $1.00, to verify which cards are still active. The ones that succeed get used for larger purchases immediately after.

The signature is clear: multiple sub-dollar transactions from the same card within a short time window.

Build the detector

CREATE MATERIALIZED VIEW card_testing_alerts AS
SELECT
    card_id,
    account_id,
    COUNT(*) AS micro_tx_count,
    SUM(amount) AS total_micro_amount,
    MIN(tx_timestamp) AS first_tx,
    MAX(tx_timestamp) AS last_tx,
    MAX(tx_timestamp) - MIN(tx_timestamp) AS time_span
FROM transactions
WHERE amount < 2.00
GROUP BY card_id, account_id
HAVING COUNT(*) >= 3;

This materialized view filters for transactions under $2.00, groups by card, and flags cards with three or more micro-transactions. RisingWave maintains this view incrementally: every new transaction that meets the criteria updates the count without re-scanning the full table.

Verified output

  card_id  | account_id | micro_tx_count | total_micro_amount |         first_tx          |          last_tx          | time_span
-----------+------------+----------------+--------------------+---------------------------+---------------------------+-----------
 card_5001 | acct_1001  |              4 |               2.15 | 2025-03-15 14:00:00+00:00 | 2025-03-15 14:03:00+00:00 | 00:03:00
(1 row)

The detector caught card_5001: four transactions totaling $2.15 in three minutes. This is a textbook card testing pattern.

How Do You Detect Geo-Velocity Fraud?

Geo-velocity fraud occurs when a stolen card is used in multiple countries within a timeframe that makes physical travel impossible. A card used in New York, then London, then Tokyo within an hour is almost certainly compromised, since no one can physically travel between those cities that fast.

Build the detector

CREATE MATERIALIZED VIEW geo_velocity_alerts AS
SELECT
    card_id,
    account_id,
    COUNT(DISTINCT merchant_country) AS distinct_countries,
    ARRAY_AGG(DISTINCT merchant_country ORDER BY merchant_country) AS countries,
    SUM(amount) AS total_amount,
    MIN(tx_timestamp) AS first_tx,
    MAX(tx_timestamp) AS last_tx,
    MAX(tx_timestamp) - MIN(tx_timestamp) AS time_span
FROM transactions
GROUP BY card_id, account_id
HAVING COUNT(DISTINCT merchant_country) >= 2
    AND MAX(tx_timestamp) - MIN(tx_timestamp) < INTERVAL '2 hours';

This view flags cards used in two or more countries within a two-hour window. The ARRAY_AGG function collects the country list for analyst review.

Verified output

  card_id  | account_id | distinct_countries | countries  | total_amount |         first_tx          |          last_tx          | time_span
-----------+------------+--------------------+------------+--------------+---------------------------+---------------------------+-----------
 card_5002 | acct_1002  |                  3 | {GB,JP,US} |      2095.00 | 2025-03-15 10:00:00+00:00 | 2025-03-15 11:00:00+00:00 | 01:00:00
(1 row)

card_5002 was used in the US, Great Britain, and Japan within a single hour, spending $2,095. Physical travel between these locations in that time is impossible, making this a strong fraud signal.

How Do You Detect Rapid Spend Anomalies?

Rapid spend detection compares current spending to an account's established baseline. When a card suddenly spends 10x or 20x its average daily amount in a single session, it is a strong indicator of account compromise. This pattern is particularly effective for catching account takeover attacks where a fraudster gains access to a legitimate account and drains it quickly.

Build the detector

This view joins transaction data with account profiles to compute a spend ratio:

CREATE MATERIALIZED VIEW rapid_spend_alerts AS
SELECT
    t.card_id,
    t.account_id,
    COUNT(*) AS tx_count,
    SUM(t.amount) AS total_spent,
    p.avg_daily_spend,
    CASE
        WHEN p.avg_daily_spend > 0
        THEN ROUND(SUM(t.amount) / p.avg_daily_spend, 1)
        ELSE 999.0
    END AS spend_ratio,
    MIN(t.tx_timestamp) AS first_tx,
    MAX(t.tx_timestamp) AS last_tx
FROM transactions t
JOIN account_profiles p ON t.account_id = p.account_id
GROUP BY t.card_id, t.account_id, p.avg_daily_spend
HAVING SUM(t.amount) > p.avg_daily_spend * 3;

The HAVING clause fires when total spend exceeds 3x the average daily amount. The spend_ratio column shows exactly how far above normal the spending is.

Verified output

  card_id  | account_id | tx_count | total_spent | avg_daily_spend | spend_ratio |         first_tx          |          last_tx
-----------+------------+----------+-------------+-----------------+-------------+---------------------------+---------------------------
 card_5003 | acct_1003  |        2 |    10300.00 |             200 |        51.5 | 2025-03-15 03:15:00+00:00 | 2025-03-15 03:20:00+00:00
 card_5001 | acct_1001  |        5 |      902.14 |             150 |         6.0 | 2025-03-15 14:00:00+00:00 | 2025-03-15 14:08:00+00:00
 card_5002 | acct_1002  |        3 |     2095.00 |              85 |        24.6 | 2025-03-15 10:00:00+00:00 | 2025-03-15 11:00:00+00:00
(3 rows)

Three accounts exceeded their spending baselines. acct_1003 stands out at 51.5x its average daily spend: two wire transfers totaling GBP 10,300 in five minutes at 3 AM. acct_1002 spent 24.6x its daily average across three countries. acct_1001 hit 6x its baseline, driven by the card testing attack followed by a large purchase.

How Do You Detect Multi-Device Account Access?

When an account suddenly uses three or more different devices within a short period, it often indicates credential theft. Legitimate users typically have one or two devices. A sudden spike in device diversity, especially combined with other signals, is a strong indicator of account compromise.

Build the detector

CREATE MATERIALIZED VIEW multi_device_alerts AS
SELECT
    account_id,
    COUNT(DISTINCT device_id) AS device_count,
    ARRAY_AGG(DISTINCT device_id ORDER BY device_id) AS devices,
    COUNT(*) AS tx_count,
    SUM(amount) AS total_amount
FROM transactions
GROUP BY account_id
HAVING COUNT(DISTINCT device_id) >= 3;

Verified output

 account_id | device_count |        devices         | tx_count | total_amount
------------+--------------+------------------------+----------+--------------
 acct_1002  |            3 | {dev_b1,dev_c1,dev_d1} |        3 |      2095.00
(1 row)

acct_1002 used three different devices across three transactions. Combined with the geo-velocity alert (three countries in one hour), this paints a clear picture of a compromised account.

How Do You Build a Composite Risk Scoring System?

Individual fraud signals are useful, but the real power comes from combining them. A card testing alert alone might have a 30% true positive rate. But card testing combined with geo-velocity anomalies and rapid spending pushes the confidence much higher.

This materialized view joins all four detection patterns into a single risk score per account:

CREATE MATERIALIZED VIEW fraud_risk_scores AS
SELECT
    t.account_id,
    t.card_id,
    COALESCE(ct.micro_tx_count, 0) AS card_test_signals,
    COALESCE(gv.distinct_countries, 0) AS geo_velocity_signals,
    COALESCE(rs.spend_ratio, 0) AS spend_ratio,
    COALESCE(md.device_count, 0) AS device_signals,
    (CASE WHEN ct.card_id IS NOT NULL THEN 30 ELSE 0 END) +
    (CASE WHEN gv.card_id IS NOT NULL THEN 35 ELSE 0 END) +
    (CASE WHEN rs.spend_ratio > 5 THEN 20 ELSE 0 END) +
    (CASE WHEN md.device_count >= 3 THEN 15 ELSE 0 END)
        AS risk_score,
    CASE
        WHEN (CASE WHEN ct.card_id IS NOT NULL THEN 30 ELSE 0 END) +
             (CASE WHEN gv.card_id IS NOT NULL THEN 35 ELSE 0 END) +
             (CASE WHEN rs.spend_ratio > 5 THEN 20 ELSE 0 END) +
             (CASE WHEN md.device_count >= 3 THEN 15 ELSE 0 END) >= 65 THEN 'BLOCK'
        WHEN (CASE WHEN ct.card_id IS NOT NULL THEN 30 ELSE 0 END) +
             (CASE WHEN gv.card_id IS NOT NULL THEN 35 ELSE 0 END) +
             (CASE WHEN rs.spend_ratio > 5 THEN 20 ELSE 0 END) +
             (CASE WHEN md.device_count >= 3 THEN 15 ELSE 0 END) >= 30 THEN 'REVIEW'
        ELSE 'ALLOW'
    END AS recommended_action
FROM (
    SELECT DISTINCT account_id, card_id FROM transactions
) t
LEFT JOIN card_testing_alerts ct
    ON t.card_id = ct.card_id AND t.account_id = ct.account_id
LEFT JOIN geo_velocity_alerts gv
    ON t.card_id = gv.card_id AND t.account_id = gv.account_id
LEFT JOIN rapid_spend_alerts rs
    ON t.card_id = rs.card_id AND t.account_id = rs.account_id
LEFT JOIN multi_device_alerts md
    ON t.account_id = md.account_id;

The scoring weights are intentional:

Signal	Weight	Rationale
Geo-velocity	35	Strongest single indicator: physically impossible travel
Card testing	30	Highly specific pattern with low false positive rate
Rapid spend	20	Common in account takeover but can occur legitimately (e.g., big purchases)
Multi-device	15	Supporting signal, rarely fraud on its own

The thresholds: a score of 65 or above triggers an automatic BLOCK, 30-64 goes to manual REVIEW, and below 30 is ALLOW.

Verified output

 account_id |  card_id  | card_test_signals | geo_velocity_signals | spend_ratio | device_signals | risk_score | recommended_action
------------+-----------+-------------------+----------------------+-------------+----------------+------------+--------------------
 acct_1002  | card_5002 |                 0 |                    3 |        24.6 |              3 |         70 | BLOCK
 acct_1001  | card_5001 |                 4 |                    0 |         6.0 |              0 |         50 | REVIEW
 acct_1003  | card_5003 |                 0 |                    0 |        51.5 |              0 |         20 | ALLOW
 acct_1004  | card_5004 |                 0 |                    0 |           0 |              0 |          0 | ALLOW
(4 rows)

The scoring system classified each account correctly:

acct_1002 (BLOCK, score 70): Geo-velocity (3 countries in 1 hour) + rapid spend (24.6x) + multiple devices. This is a clear account compromise.
acct_1001 (REVIEW, score 50): Card testing (4 micro-transactions) + elevated spend (6x). Needs human review to confirm fraud.
acct_1003 (ALLOW, score 20): High spend ratio (51.5x) but only one signal. Could be a legitimate large transfer. The system correctly avoids over-blocking.
acct_1004 (ALLOW, score 0): Normal activity from a new account. No fraud signals triggered.

How Do Materialized Views Update in Real Time?

The key advantage of using materialized views for fraud detection is that they update automatically. When new transactions arrive, all five materialized views recompute incrementally without any manual refresh or scheduled jobs.

Let's demonstrate this. Suppose new card testing transactions arrive for acct_1003:

INSERT INTO transactions VALUES
    ('tx_017', 'card_5003', 'acct_1003', 0.10, 'GBP', 'online_retail', 'NG', 'web',
     '2025-03-15 03:25:00+00', '198.51.100.20', 'dev_e1'),
    ('tx_018', 'card_5003', 'acct_1003', 0.25, 'GBP', 'online_retail', 'NG', 'web',
     '2025-03-15 03:26:00+00', '198.51.100.20', 'dev_e1'),
    ('tx_019', 'card_5003', 'acct_1003', 0.15, 'GBP', 'digital_goods', 'NG', 'web',
     '2025-03-15 03:27:00+00', '198.51.100.20', 'dev_e1');

Now query the composite risk scores again:

SELECT * FROM fraud_risk_scores ORDER BY risk_score DESC;

Updated output

 account_id |  card_id  | card_test_signals | geo_velocity_signals | spend_ratio | device_signals | risk_score | recommended_action
------------+-----------+-------------------+----------------------+-------------+----------------+------------+--------------------
 acct_1002  | card_5002 |                 0 |                    3 |        24.6 |              3 |         70 | BLOCK
 acct_1003  | card_5003 |                 3 |                    0 |        51.5 |              0 |         50 | REVIEW
 acct_1001  | card_5001 |                 4 |                    0 |         6.0 |              0 |         50 | REVIEW
 acct_1004  | card_5004 |                 0 |                    0 |           0 |              0 |          0 | ALLOW
(4 rows)

acct_1003 jumped from ALLOW (score 20) to REVIEW (score 50). The card testing detector picked up the three new micro-transactions, and the composite scorer combined that signal with the existing rapid spend alert. No manual intervention was needed: RisingWave's incremental view maintenance propagated the update through the entire chain of materialized views automatically.

In production, you would connect the fraud_risk_scores view to a Kafka sink to push alerts downstream to your case management system, or use the PostgreSQL-compatible interface to query risk scores directly from your payment gateway's decision service.

What Are Common Pitfalls in Streaming Fraud Detection?

Building fraud detection systems that work in production requires more than just writing SQL queries. Here are four common pitfalls and how to avoid them.

Threshold calibration

The thresholds in this tutorial (3 micro-transactions, 2 countries, 3x spend ratio) are starting points. Real-world thresholds need calibration against your historical fraud data. Start with conservative thresholds (higher sensitivity), measure false positive rates, and adjust. RisingWave makes this easy: you can alter materialized views or create new versions with different thresholds and compare results side by side.

Time window management

The geo-velocity detector in this tutorial uses a simple approach: it checks if all transactions for a card span less than two hours. In production, you would typically use tumbling or hopping windows to partition events into fixed time intervals. This gives you more precise control over detection windows and prevents alerts from accumulating indefinitely.

False positive reduction

The composite scoring approach helps reduce false positives by requiring multiple signals before escalating to BLOCK. Additional strategies include: maintaining a whitelist of known traveler accounts that exempts them from geo-velocity checks, using merchant category codes to weight risk differently (a $5,000 jewelry purchase is more suspicious than a $5,000 hotel booking), and tracking historical patterns per account rather than using global thresholds.

Scaling to production throughput

A payment processor handling 10,000 transactions per second needs a streaming engine that can keep up. RisingWave supports parallel processing across multiple compute nodes. Each materialized view in the fraud detection pipeline can be distributed across nodes, and the engine handles coordination and state consistency automatically.

FAQ

What is card testing fraud?

Card testing is a payment fraud technique where criminals use stolen card numbers to make multiple small transactions (typically under $1.00) to verify which cards are still active. Cards that pass these tests are then used for larger fraudulent purchases. Streaming SQL detects this pattern by counting micro-transactions per card within a time window.

How does streaming SQL compare to batch processing for fraud detection?

Streaming SQL processes each transaction as it arrives, enabling sub-second fraud detection. Batch processing, by contrast, runs on a schedule (hourly or daily) and can only detect fraud after the batch completes. For payment fraud, where the window between a stolen card being tested and a large purchase being made can be under 10 minutes, batch processing is too slow to prevent losses.

Can materialized views replace machine learning models for fraud detection?

Materialized views and ML models serve complementary roles. Materialized views excel at rule-based pattern detection (card testing, geo-velocity, spend anomalies) with deterministic logic that is easy to audit and explain. ML models are better at detecting subtle, previously unseen fraud patterns. Many production systems use both: streaming SQL for known patterns with fast response times, and ML models for scoring transactions that pass the rule layer.

What throughput can RisingWave handle for payment fraud detection?

RisingWave processes millions of events per second in distributed deployments. For fraud detection workloads, the bottleneck is typically the number of materialized views and the complexity of joins rather than raw event throughput. The composite risk scoring view in this tutorial involves four LEFT JOINs, which RisingWave handles efficiently through incremental computation, only reprocessing the data that changed rather than re-scanning the full dataset.

Conclusion

Payment fraud detection requires continuous evaluation of transactions as they arrive. Batch systems cannot match the speed of modern fraud attacks, where card testing, geo-velocity abuse, and account takeover can unfold in minutes.

Key takeaways from this tutorial:

Card testing detection identifies stolen cards by counting micro-transactions under $2.00, catching the verification phase before large purchases happen.
Geo-velocity alerts flag physically impossible travel patterns by tracking distinct countries per card within a time window.
Rapid spend monitoring compares current activity against account baselines to catch account takeover attacks.
Composite risk scoring combines multiple weak signals into strong fraud indicators, reducing false positives while maintaining high detection rates.
Incremental updates through materialized views mean new fraud signals propagate through the entire detection pipeline automatically, with no manual refresh needed.

All SQL in this tutorial runs on RisingWave v2.8.0 and produces the output shown. You can replicate these examples on your own instance by following the quickstart guide.

Ready to build real-time fraud detection? Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.