Real-Time Fraud Detection in 2026: SQL Patterns That Actually Work

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is real-time fraud detection in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Real-time fraud detection in 2026 means evaluating every transaction against continuously updated fraud signals -- velocity, amount deviation, device fingerprints, geographic anomalies -- as the transaction arrives, not hours later in a batch job. The target is flagging suspicious transactions in under one second. Streaming SQL databases like RisingWave make this achievable with materialized views that update incrementally on every event, without custom Java code or separate rule engines."
      }
    },
    {
      "@type": "Question",
      "name": "Why does batch fraud detection miss modern attacks?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Batch fraud detection runs on a schedule -- hourly, every six hours, or daily. Modern payment fraud happens in minutes. A compromised card is typically drained within 20 to 30 minutes of first use. A batch job that runs every six hours catches the fraud long after the money has moved. Streaming detection closes the window by evaluating signals as each transaction lands."
      }
    },
    {
      "@type": "Question",
      "name": "What fraud signals work well in streaming SQL?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Streaming SQL excels at rule-based signals: transaction velocity (too many transactions in a short window), amount deviation (spending far above historical baseline), new device combined with high-value transaction, and merchant category sequence anomalies. These signals can be expressed as SQL materialized views that update in real time. Deep behavioral pattern detection requiring sequence models or graph neural networks still requires separate ML infrastructure alongside the SQL layer."
      }
    },
    {
      "@type": "Question",
      "name": "How does RisingWave fit into a fraud detection architecture?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RisingWave ingests transaction events from Kafka and maintains fraud signal materialized views incrementally -- velocity windows, amount anomalies, device fingerprints. It serves these signals over a PostgreSQL wire protocol interface, so your fraud decision service queries them with any PostgreSQL client. High-risk transactions are sunk back to Kafka for downstream blocking or review. The total latency from transaction event to fraud alert is typically under one second."
      }
    },
    {
      "@type": "Question",
      "name": "Can SQL fraud detection replace machine learning models?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "No -- they are complementary. SQL materialized views handle deterministic rule-based signals fast and with full auditability: velocity counts, amount ratios, device counts, impossible travel. ML models handle complex behavioral patterns that do not reduce to simple rules. The production-grade approach in 2026 is to run both: streaming SQL for the fast rule layer that blocks in real time, and ML scoring in parallel for the complex-pattern review queue."
      }
    }
  ]
}

A fraudster obtains a stolen card. They run four sub-dollar transactions in three minutes to confirm the card is live. Two minutes later, a $1,400 electronics purchase goes through. Six hours later, your batch fraud job flags the account. The money has been gone for almost six hours.

This scenario plays out thousands of times a day. Global payment fraud losses have exceeded $40 billion annually in recent years, and the core reason losses remain high is not that detection logic is wrong -- it is that detection runs on the wrong schedule. The fraud happens in minutes. The detection happens in hours.

Streaming SQL materialized views close that gap. This article walks through a complete, production-grade fraud detection pipeline built in RisingWave, explains where streaming SQL excels, and is honest about where it falls short.

How Batch Fraud Detection Fails

The standard batch architecture looks like this: transactions land in a data warehouse or data lake. An ETL job, usually scheduled hourly or every few hours, reads the accumulated transactions, applies aggregation logic, and writes fraud signals to a decision table. The fraud decision service reads from that table at transaction time.

Three problems compound when fraud is the use case.

The detection window is the attack window. Payment fraud is concentrated in the minutes immediately after a card or account is compromised. A fraudster testing a stolen card runs micro-transactions, confirms them in seconds, and moves immediately to high-value purchases. A six-hour batch window means the entire attack -- testing phase, exploitation phase, cash-out -- completes before the first signal fires.

Aggregations are stale at the moment they matter. Velocity features computed at 2:00 AM tell you nothing about what happened at 2:47 AM when a real transaction needs a fraud score. The batch pipeline produces a snapshot that ages immediately.

Rule logic is disconnected from serving. Batch pipelines compute signals, write them to a store, and then separate serving logic reads and evaluates them. This split makes it hard to keep rules consistent and impossible to add signals without a full pipeline redeployment.

A streaming approach eliminates all three problems. Signals update within milliseconds of each transaction. The velocity count a decision service reads reflects what happened two seconds ago, not two hours ago. And the signal logic lives in SQL views that the serving layer reads directly.

The Signals That Matter

Fraud detection research and production experience converge on a consistent set of high-signal features that can be computed in SQL:

Transaction velocity. The single strongest early signal. Fraudsters move fast because speed reduces detection risk. A legitimate cardholder making four purchases in three minutes is unusual. A compromised card being probed and drained makes it nearly diagnostic.

Amount deviation from baseline. Each user has a spending pattern. A $5,000 wire transfer from an account whose average transaction is $35 is anomalous regardless of where it happens. Deviation from personal baseline is more informative than absolute amount.

Geographic anomaly (impossible travel). A transaction in London followed forty minutes later by a transaction in Tokyo is physically impossible. Card-not-present fraud frequently shows this signature because fraudsters operate from fixed locations while the compromised card's owner is elsewhere.

Device fingerprint. Legitimate users transact from one or two devices. A card suddenly appearing on four distinct devices in one hour is a device proliferation signal. Combined with a high-value transaction on the newest device, it is a strong block signal.

Merchant category sequence. A user whose transaction history shows steady grocery and coffee purchases who suddenly runs three back-to-back wire transfer transactions shows a sequence anomaly. This is weaker than the above signals but useful in composite scoring.

The SQL Pipeline

The following SQL runs on RisingWave. You can try it on RisingWave Cloud or a local instance with docker run -it --pull=always -p 4566:4566 risingwavelabs/risingwave:latest playground.

Transaction stream from Kafka

CREATE TABLE transactions (
    txn_id         TEXT,
    user_id        TEXT,
    amount         DECIMAL,
    merchant_id    TEXT,
    merchant_cat   TEXT,
    device_id      TEXT,
    ip_address     TEXT,
    txn_time       TIMESTAMPTZ
) FROM transactions_source;

In production this would be a CREATE SOURCE against a Kafka topic. For local testing, the table form above accepts direct inserts.

Signal 1: Transaction velocity windows

Velocity is the fastest-updating signal. Use a 15-minute tumbling window to count transactions and total spend per user:

CREATE MATERIALIZED VIEW user_velocity_15min AS
SELECT
    user_id,
    COUNT(*)       AS txn_count,
    SUM(amount)    AS total_amount,
    COUNT(DISTINCT merchant_id) AS merchant_count,
    window_start,
    window_end
FROM TUMBLE(transactions, txn_time, INTERVAL '15' MINUTE)
GROUP BY user_id, window_start, window_end;

The TUMBLE function partitions the transaction stream into fixed 15-minute windows. RisingWave maintains this view incrementally: each new transaction updates only the affected user's row in the current window without re-scanning historical data.

Signal 2: Anomaly detection against user baseline

Velocity alone cannot determine anomaly. A high-volume trader running 20 transactions in 15 minutes is normal for them. The signal is velocity relative to that user's historical baseline.

Store historical baselines in a reference table:

CREATE TABLE user_baselines (
    user_id           TEXT,
    avg_txn_count     DECIMAL,
    avg_total_amount  DECIMAL,
    PRIMARY KEY (user_id)
);

Join the live velocity view against baselines to detect deviations:

CREATE MATERIALIZED VIEW velocity_anomalies AS
SELECT
    v.user_id,
    v.txn_count,
    v.total_amount,
    b.avg_txn_count,
    b.avg_total_amount,
    v.txn_count::FLOAT / NULLIF(b.avg_txn_count, 0) AS velocity_ratio,
    v.window_start,
    v.window_end
FROM user_velocity_15min v
JOIN user_baselines b ON v.user_id = b.user_id
WHERE v.txn_count > b.avg_txn_count * 3
   OR v.total_amount > b.avg_total_amount * 5;

A user whose baseline is 2 transactions per 15-minute window and who hits 7 in the current window has a velocity_ratio of 3.5 -- well above the 3x threshold. A user whose baseline is 15 transactions per window hitting 20 would not trigger, because for them this is within normal range.

The multipliers (3x count, 5x amount) are starting points. Calibrate these against your historical fraud data. RisingWave makes iteration cheap: drop and recreate the view with adjusted thresholds in seconds.

Signal 3: Device fingerprint anomaly

New device plus high amount is one of the most reliable composite signals. The pattern: a card that has been used on the same two devices for months suddenly appears on a device never seen before, immediately making a large purchase.

CREATE MATERIALIZED VIEW new_device_high_amount AS
SELECT
    t.txn_id,
    t.user_id,
    t.amount,
    t.device_id,
    t.txn_time
FROM transactions t
LEFT JOIN (
    SELECT DISTINCT user_id, device_id
    FROM transactions
    WHERE txn_time >= NOW() - INTERVAL '30 days'
) known_devices ON t.user_id = known_devices.user_id
                AND t.device_id = known_devices.device_id
WHERE known_devices.device_id IS NULL
  AND t.amount > 500;

This view identifies transactions where the device has not been seen for this user in the past 30 days and the amount exceeds $500. Both thresholds are configurable. Adjust the dollar threshold based on your user base's spending patterns -- a $500 threshold is appropriate for consumer cards; a $5,000 threshold might be right for business accounts.

Signal 4: Merchant category sequence anomaly

A user's merchant category history reflects their lifestyle. Grocery stores, coffee shops, streaming subscriptions. When the sequence suddenly shifts to wire transfers, jewelry, or foreign currency exchanges, it is worth flagging.

CREATE MATERIALIZED VIEW recent_merchant_categories AS
SELECT
    user_id,
    array_agg(merchant_cat ORDER BY txn_time DESC) AS recent_cats,
    window_start,
    window_end
FROM TUMBLE(transactions, txn_time, INTERVAL '1' HOUR)
GROUP BY user_id, window_start, window_end;

This view captures the sequence of merchant categories in the most recent hour. Downstream logic -- either a join against a risk category table or application-layer scoring -- can evaluate whether the sequence is anomalous for that user. Merchant category sequence is best used as a supporting signal in composite scoring rather than a standalone trigger.

Composite risk scoring

Individual signals have false positive rates. A composite score combining multiple weak signals produces a stronger, more specific fraud indicator.

CREATE MATERIALIZED VIEW transaction_risk_scores AS
SELECT
    t.txn_id,
    t.user_id,
    t.amount,
    t.txn_time,
    CASE WHEN va.user_id IS NOT NULL THEN 50 ELSE 0 END +
    CASE WHEN nd.txn_id IS NOT NULL THEN 30 ELSE 0 END AS risk_score
FROM transactions t
LEFT JOIN velocity_anomalies va
    ON va.user_id = t.user_id
    AND t.txn_time BETWEEN va.window_start AND va.window_end
LEFT JOIN new_device_high_amount nd ON nd.txn_id = t.txn_id;

Scoring weights here: velocity anomaly (50 points), new device with high amount (30 points). A transaction triggering both signals scores 80 and should route to an automatic block or immediate review queue. A transaction with only the new device signal scores 30 and might route to step-up authentication.

The weights are illustrative. Your production weights should reflect your empirical false positive and false negative rates. What the SQL structure enables is rapid iteration: adjust a weight, recreate the view, and the new scoring applies to all subsequent transactions immediately.

Sinking Alerts Downstream

The risk score view is live and queryable. But for high-risk transactions, you also want to push alerts to downstream systems -- a case management tool, a payment gateway decision API, or a Slack alert channel.

RisingWave sinks handle this:

CREATE SINK fraud_alerts_sink
FROM (
    SELECT txn_id, user_id, amount, risk_score, txn_time
    FROM transaction_risk_scores
    WHERE risk_score >= 50
)
WITH (
    connector = 'kafka',
    topic = 'fraud-alerts',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Every time a transaction's risk score reaches or exceeds 50, RisingWave emits a record to the fraud-alerts Kafka topic. Downstream consumers can take whatever action is appropriate: call the payment gateway's block API, open a case in your fraud management platform, or notify the cardholder.

What Changed in 2026

Several forces have reshaped fraud detection in the past two years.

AI agents can now investigate flagged transactions. When a high-risk transaction lands in the review queue, an AI agent can query the materialized views directly -- pulling the user's full velocity history, their device history, their merchant category sequence -- and produce a structured investigation summary for a human analyst. This shortens review time from minutes to seconds. RisingWave's PostgreSQL wire protocol makes this straightforward: agents connect with any PostgreSQL client library and query views by user ID.

Vector similarity enables pattern matching against known fraud signatures. RisingWave includes a native vector(n) type, cosine distance operator (<=>), and openai_embedding() function. In principle, you can embed transaction patterns and compare them against a library of known fraud patterns using HNSW index lookups. This is a newer pattern in 2026 and still being refined in production, but the infrastructure is available within the same system that handles your streaming SQL.

MCP allows natural language access to streaming views. The RisingWave MCP server lets AI agents query real-time materialized views through natural language. A fraud investigator can ask "what was the velocity profile for user X in the hour before the flagged transaction?" and get an answer from the live streaming view, not a static dashboard.

Where SQL Fraud Detection Falls Short

Being honest about limitations is as important as showcasing capabilities.

Sophisticated ML fraud detection is a separate system. Graph neural networks that detect fraud rings by analyzing relationships between accounts, cards, and devices are not expressible in SQL. Sequence models that learn behavioral fingerprints require training pipelines, model registries, and inference infrastructure. The SQL patterns in this article handle a meaningful fraction of fraud volume -- velocity-based attacks, new device attacks, amount anomalies -- but they do not replace purpose-built ML fraud detection.

Cold-start problem for new users. Baseline-relative scoring requires a baseline. New users have no transaction history. For accounts with fewer than 30 days of history, consider applying stricter absolute thresholds rather than baseline-relative ones, or routing all transactions above a certain amount to manual review regardless of velocity.

Window boundary edge cases. Tumbling windows create seams. A fraudster running transactions across a 15-minute window boundary will have their activity split between two windows, potentially falling below the count threshold in each while exceeding it in aggregate. Hopping windows mitigate this by overlapping windows:

CREATE MATERIALIZED VIEW user_velocity_hop AS
SELECT
    user_id,
    COUNT(*)    AS txn_count,
    SUM(amount) AS total_amount,
    window_start,
    window_end
FROM HOP(transactions, txn_time, INTERVAL '5' MINUTE, INTERVAL '15' MINUTE)
GROUP BY user_id, window_start, window_end;

Here the hop size is 5 minutes and the window size is 15 minutes. Every 5 minutes, a new 15-minute window opens. A burst of transactions will be captured in at least one window regardless of when it starts relative to window boundaries.

False positive management requires continuous calibration. A velocity threshold that works in January may generate excessive false positives in December when holiday shopping spikes are legitimate. Thresholds need periodic review against false positive rates and business context.

The Hybrid Architecture

The production approach in 2026 is not streaming SQL or ML -- it is streaming SQL and ML operating in parallel.

The streaming SQL layer handles the fast path: transactions arrive, materialized views update in milliseconds, high-confidence rule-based signals block or flag transactions immediately. This layer is auditable, deterministic, and explainable. When a transaction is blocked, you can show the customer exactly which rule triggered and why.

The ML layer handles the complex path: transactions that clear the rule layer are scored by a model that evaluates behavioral patterns too subtle for rules. The ML score feeds into a review queue where human analysts and AI investigation agents work through the cases that require judgment.

The two layers share infrastructure: the streaming SQL materialized views that feed real-time fraud signals also serve as the feature store for ML model inference. The ML model queries the same velocity views, the same baseline tables, and the same device history that the rule layer uses. This eliminates training-serving skew -- the model sees the same features in production that it trained on.

Monitoring the Pipeline

Once the pipeline is running, you need visibility into its health. RisingWave provides system tables for this:

-- Check lag on the Kafka source
SELECT * FROM rw_kafka_job_lag;

-- Inspect running streaming jobs
SELECT id, name, definition, created_at
FROM rw_streaming_jobs
ORDER BY created_at DESC;

A rising consumer_lag on your transactions topic means RisingWave is falling behind the event stream. This will degrade detection latency. Monitor it alongside standard infrastructure metrics.

For fraud-specific monitoring, track the signal trigger rate over time: if velocity_anomalies suddenly starts generating 10x the normal rows, either you have a fraud wave or a threshold calibration issue. Either way, you want to know.

Latency: What to Expect

The end-to-end latency chain from transaction event to fraud alert looks like this:

Transaction event produced to Kafka by payment gateway: near-zero
RisingWave consumes from Kafka and updates materialized views: typically 100-500 milliseconds
Sink emits fraud alert to fraud-alerts Kafka topic: triggered by materialized view update
Downstream consumer picks up alert: depends on consumer polling interval

Total from transaction event to fraud alert in Kafka: typically under one second under normal load. This is well within the window to take action before a payment settles, particularly for card-not-present transactions where settlement windows are measured in seconds to minutes.

The latency depends on Kafka consumer group configuration, RisingWave cluster sizing, and the complexity of the materialized view graph. For very low-latency requirements, profile your specific pipeline on representative data volumes before going to production.

Getting Started

The quickest path to running this pipeline:

Start RisingWave locally: docker run -it --pull=always -p 4566:4566 risingwavelabs/risingwave:latest playground
Connect with psql: psql -h localhost -p 4566 -d dev -U root
Create the tables and materialized views from this article
Insert test transactions and observe the views update

For production, RisingWave Cloud provides a managed deployment with no infrastructure to operate. Connect with the same psql client and the same SQL.

The fraud detection pipeline described here -- velocity windows, baseline anomaly detection, device fingerprinting, composite scoring, Kafka sink -- is a starting point. Adapt the window sizes, score weights, and thresholds to your transaction patterns. The SQL structure makes iteration fast: change a threshold, recreate the view, and the new logic applies immediately.

Real-time fraud detection is not a solved problem in 2026. But the infrastructure gap -- the reason detection ran on batch schedules while fraud ran in real time -- has closed. The tools are here. The patterns work. The question now is calibration and coverage, not architecture.

Try RisingWave Cloud free, no credit card required. Sign up here.

Join the RisingWave community on Slack to ask questions and share what you build.