E-Commerce Fraud Detection: Real-Time Transaction Monitoring

Fraud detection in e-commerce is a race: legitimate orders should complete in milliseconds, while fraudulent patterns — velocity attacks, account takeovers, synthetic identity fraud — must be flagged before fulfillment begins. RisingWave, a PostgreSQL-compatible streaming database, enables real-time fraud signal computation using standard SQL, without building separate machine learning pipelines for each detection rule.

The Problem: Batch Fraud Analysis Misses the Window

E-commerce fraud evolves faster than batch detection systems can respond. By the time a nightly fraud report surfaces a pattern, the fraudster has already placed hundreds of orders, collected digital goods or triggered COD deliveries, and moved on.

The specific failure modes of batch fraud detection:

Velocity attacks: A fraudster tests stolen card numbers with small transactions across many accounts. Velocity checks (N transactions from the same device or IP within M minutes) require sliding-window computation that batch systems cannot provide in real time.

Account takeover: A compromised account places an unusual order — different shipping address, new payment method, atypical order value. Batch systems compare to historical patterns overnight. By then, the order has shipped.

COD fraud rings: In Southeast Asia and South Asia, cash on delivery fraud involves placing orders with false delivery addresses, then refusing delivery or providing fake IDs. Detection requires pattern matching across multiple orders from the same device fingerprint or IP range.

Chargeback pattern detection: Certain customer IDs, email domains, or IP ranges show elevated chargeback rates. Identifying these patterns in real time — before new orders from the same signals are fulfilled — requires continuously updated aggregations.

How Streaming SQL Solves This

RisingWave maintains sliding window aggregations over the transaction stream. For each new transaction event, velocity counters (transactions per IP, per device fingerprint, per customer ID) are updated in real time. Anomaly rules expressed as SQL predicates fire immediately when thresholds are crossed.

Because these computations are incremental materialized views, they add negligible latency to the transaction processing path — they run alongside the normal order flow, not in the critical path.

Building Real-Time Fraud Detection

Step 1: Data Source Setup

Fraud detection requires both order events and payment events, with device and network signals included:

CREATE SOURCE orders (
    order_id        VARCHAR,
    merchant_id     VARCHAR,
    customer_id     VARCHAR,
    channel         VARCHAR,        -- 'web', 'mobile', 'pos', 'marketplace'
    payment_method  VARCHAR,        -- 'card', 'alipay', 'paypal', 'cod'
    subtotal        NUMERIC,
    discount        NUMERIC,
    total           NUMERIC,
    status          VARCHAR,
    created_at      TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'ecommerce.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

CREATE SOURCE transaction_signals (
    signal_id           VARCHAR,
    order_id            VARCHAR,
    customer_id         VARCHAR,
    device_fingerprint  VARCHAR,
    ip_address          VARCHAR,
    ip_country          VARCHAR,
    email_domain        VARCHAR,
    shipping_address_hash VARCHAR,
    billing_country     VARCHAR,
    is_proxy            BOOLEAN,
    is_vpn              BOOLEAN,
    occurred_at         TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'ecommerce.transaction_signals',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Core Materialized View

The velocity check view computes transaction counts per device fingerprint, IP address, and customer ID in a sliding 1-hour window. This catches card testing attacks and account reuse patterns:

-- Velocity counters: transactions per device fingerprint in 1 hour
CREATE MATERIALIZED VIEW mv_device_velocity_1h AS
SELECT
    device_fingerprint,
    WINDOW_START                                AS window_start,
    COUNT(DISTINCT order_id)                    AS order_count,
    COUNT(DISTINCT customer_id)                 AS unique_customers,
    COUNT(DISTINCT ip_address)                  AS unique_ips,
    SUM(o.total)                                AS total_transaction_value
FROM HOP(transaction_signals, occurred_at, INTERVAL '5 MINUTES', INTERVAL '1 HOUR')
JOIN orders o ON transaction_signals.order_id = o.order_id
GROUP BY device_fingerprint, WINDOW_START;

-- Velocity counters: transactions per IP in 1 hour
CREATE MATERIALIZED VIEW mv_ip_velocity_1h AS
SELECT
    ip_address,
    WINDOW_START                                AS window_start,
    COUNT(DISTINCT order_id)                    AS order_count,
    COUNT(DISTINCT customer_id)                 AS unique_customers,
    COUNT(DISTINCT device_fingerprint)          AS unique_devices
FROM HOP(transaction_signals, occurred_at, INTERVAL '5 MINUTES', INTERVAL '1 HOUR')
GROUP BY ip_address, WINDOW_START;

Step 3: Aggregations — Fraud Signal Scoring

Compute a composite fraud risk score by combining multiple signals. Flag transactions that exceed individual thresholds or combine multiple moderate signals:

-- Flags: high-velocity devices
CREATE MATERIALIZED VIEW mv_high_velocity_devices AS
SELECT
    device_fingerprint,
    window_start,
    order_count,
    unique_customers,
    'high_device_velocity'                      AS fraud_signal,
    CASE
        WHEN order_count >= 20 THEN 'critical'
        WHEN order_count >= 10 THEN 'high'
        ELSE 'medium'
    END                                         AS severity
FROM mv_device_velocity_1h
WHERE order_count >= 5;

-- Flags: shared device across multiple customers (account takeover signal)
CREATE MATERIALIZED VIEW mv_device_sharing_flags AS
SELECT
    device_fingerprint,
    window_start,
    unique_customers,
    order_count,
    'device_shared_across_customers'            AS fraud_signal,
    'high'                                      AS severity
FROM mv_device_velocity_1h
WHERE unique_customers >= 3;

-- Flags: orders with IP/billing country mismatch
CREATE MATERIALIZED VIEW mv_geolocation_mismatches AS
SELECT
    ts.order_id,
    ts.customer_id,
    ts.device_fingerprint,
    ts.ip_country,
    ts.billing_country,
    ts.is_proxy,
    ts.is_vpn,
    'geo_mismatch'                              AS fraud_signal,
    CASE WHEN ts.is_proxy OR ts.is_vpn THEN 'high' ELSE 'medium' END AS severity
FROM transaction_signals ts
WHERE ts.ip_country != ts.billing_country
   OR ts.is_proxy = TRUE
   OR ts.is_vpn = TRUE;

Track chargeback rates by email domain over a rolling 7-day window to identify synthetic identity patterns:

CREATE MATERIALIZED VIEW mv_email_domain_risk_7d AS
SELECT
    ts.email_domain,
    WINDOW_START                                AS window_start,
    COUNT(DISTINCT ts.order_id)                 AS total_orders,
    SUM(CASE WHEN o.status = 'charged_back'
             THEN 1 ELSE 0 END)               AS chargebacks,
    ROUND(
        SUM(CASE WHEN o.status = 'charged_back'
                 THEN 1 ELSE 0 END)::NUMERIC /
        NULLIF(COUNT(DISTINCT ts.order_id), 0) * 100,
        2
    )                                           AS chargeback_rate_pct
FROM HOP(transaction_signals, occurred_at,
         INTERVAL '1 HOUR', INTERVAL '7 DAYS') ts
JOIN orders o ON ts.order_id = o.order_id
GROUP BY ts.email_domain, WINDOW_START
HAVING COUNT(DISTINCT ts.order_id) >= 5;  -- minimum sample size

Step 4: Downstream / Serving

Publish fraud signals to a risk scoring API and a block list management system:

-- Real-time fraud signals to risk API
CREATE SINK sink_fraud_signals
FROM mv_high_velocity_devices
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'fraud.velocity_signals'
) FORMAT PLAIN ENCODE JSON;

-- Device sharing flags to block list topic
CREATE SINK sink_device_flags
FROM mv_device_sharing_flags
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'fraud.device_flags'
) FORMAT PLAIN ENCODE JSON;

-- Email domain risk scores to application DB for checkout pre-screening
CREATE SINK sink_domain_risk
FROM mv_email_domain_risk_7d
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://fraud-db:5432/risk',
    table.name = 'email_domain_risk',
    type = 'upsert',
    primary_key = 'email_domain,window_start'
);

Fraud Detection: Batch vs. Streaming SQL

Signal Type	Batch Detection	Streaming SQL (RisingWave)
Velocity check (device)	Daily report	1-hour sliding window, continuous
Velocity check (IP)	Daily report	1-hour sliding window, continuous
Account sharing	Overnight analysis	Real-time device × customer count
Geo mismatch	At order review	At order placement
Chargeback rate by domain	Weekly report	7-day rolling window, continuously updated
COD fraud ring	Manual investigation	Pattern match in streaming SQL
Response time	Hours after fraud occurs	Within the fraud event window

FAQ

Q: Can RisingWave replace a dedicated fraud ML platform? RisingWave computes the real-time features (velocity signals, pattern flags, risk scores) that ML models consume. The ML scoring itself typically runs in a separate inference service. RisingWave feeds that service with continuously fresh features — effectively replacing the batch feature computation pipeline.

Q: How do I handle false positives without blocking legitimate high-value customers? Add a customer reputation table (maintained via CDC from your customer database) and join it into the fraud signal views. A customer with a long, clean transaction history on the same device gets a lower fraud score even if their IP triggers a moderate flag.

Q: How do I maintain a real-time block list (banned devices, IPs) in RisingWave? Use a CDC source connected to your block list database (PostgreSQL or MySQL). RisingWave will maintain the join between the block list and incoming transaction signals in real time — any transaction from a newly added blocked device is flagged within seconds.

Key Takeaways

Velocity checks, the foundation of real-time fraud detection, require sliding window aggregations that only streaming SQL can compute continuously.
HOP windows in RisingWave update every 5 minutes while maintaining a 1-hour lookback — giving you a balance between freshness and detection window size.
Device sharing detection (multiple customers on one device fingerprint) is a native sliding window aggregation — COUNT(DISTINCT customer_id) per device per hour.
Chargeback rate by email domain, computed over a 7-day rolling window, surfaces synthetic identity fraud patterns that emerge slowly over many transactions.
Fraud signals published to Kafka enable downstream systems (risk API, block list management, order hold workflows) to act within the fraud window.

Get started | Slack