How to Build a Real-Time Fraud Detection System with SQL

You can build a real-time fraud detection system with SQL by connecting RisingWave to a Kafka transaction stream, defining materialized views that run velocity checks and stream-table joins against blacklists, and querying those views from your payment gateway in milliseconds. No Java, no separate compute cluster, no rule redeployment for each change.

What makes real-time fraud detection different from batch detection?

Batch fraud detection runs at the end of the day or the end of an hour. By the time a fraudulent card is flagged, dozens of transactions may have cleared. Card fraud operates in seconds: a stolen card number sold on the dark web is tested with a small transaction, then drained within minutes if the test passes.

The latency problem is not academic. A fraud detection pipeline that takes 500 milliseconds to respond can still block a transaction before the authorization response goes back to the merchant terminal. A pipeline that takes two hours cannot. The difference between real-time and batch is the difference between preventing fraud and detecting it after the damage is done.

Real-time fraud detection requires continuous computation: every transaction must be evaluated against historical patterns for that card, that merchant category, that IP address, and that device fingerprint as it arrives. Batch systems cannot do this because the state is always stale. Stream processing systems can, because they maintain aggregations continuously and return results from pre-computed materialized views rather than scanning raw events on demand.

How does RisingWave detect fraud in real time?

RisingWave ingests transaction events directly from Kafka as a streaming source and maintains SQL materialized views that are incrementally updated as each event arrives. When your payment gateway queries a materialized view, it reads a pre-computed result, not a live scan, so latency stays in the single-digit millisecond range even at high transaction volumes.

The two core mechanisms are velocity checks and stream-table joins.

A velocity check aggregates transaction counts or amounts over a sliding time window for a given card or account. If a card completes more than five transactions in sixty seconds, that pattern is flagged. Because RisingWave maintains the aggregation state continuously in S3-compatible object storage, the view always reflects the most recent window without recomputing from scratch on every query.

A stream-table join enriches each incoming transaction event with data from a static or slowly changing reference table, such as a blacklist of compromised card numbers, a list of high-risk merchant category codes, or a customer risk profile table. RisingWave supports stream-table joins natively in SQL, so you write a standard JOIN between your source and the reference table, and RisingWave handles the incremental maintenance.

What fraud patterns can SQL window aggregations detect?

Velocity fraud is the most direct pattern. A legitimate cardholder makes a few transactions per day. A stolen card is used repeatedly in a short window before the card is canceled. A tumbling or sliding window aggregation over transaction count and total amount per card ID, grouped by short time intervals, surfaces this pattern immediately.

Geographic anomalies require lat/lon coordinates already present in your transaction event payload. If your transaction schema includes merchant_lat and merchant_lon, you can use SQL arithmetic to compute approximate distances between consecutive transactions for the same card. A card used in New York and then Tokyo within two hours is flagged by a self-join or by comparing the current event against a latest-transaction materialized view. Note that RisingWave provides standard SQL math functions for this calculation; it does not include a built-in geo distance function, so you compute the Haversine formula in SQL or pre-compute distances in your event enrichment layer.

Multi-signal correlation combines several weak signals into a stronger fraud score. A single transaction at an unusual hour is not suspicious on its own. A transaction at an unusual hour, from a new device, in a new country, above the card's 30-day average amount, all at once, is a high-confidence fraud signal. RisingWave lets you join multiple materialized views in a single query, each tracking one dimension of behavior, to produce a combined score.

Account takeover patterns look different from card-present fraud. They show password reset events followed by profile changes followed by large transfers, all within a short session. A session window aggregation groups events by inactivity gap rather than fixed clock time, which matches the natural rhythm of a user session.

How does RisingWave compare to Apache Flink for fraud detection?

Apache Flink is a powerful stream processing engine, but it is not designed for fraud detection teams whose primary language is SQL.

Writing fraud rules in Flink requires Java or Scala. Each rule is a DataStream or Table API program that must be compiled, packaged, and deployed to a Flink cluster. When a rule changes because a new fraud pattern is identified, the job must be redeployed. Redeployment in Flink means stopping the current job (or starting a savepoint), deploying the new job, and restoring state from the savepoint. This cycle typically takes minutes to tens of minutes, which means your fraud team cannot respond quickly to an emerging attack pattern.

RisingWave stores state in S3-compatible object storage rather than on local disks. Flink uses RocksDB on local disk for hot state, which requires careful checkpoint configuration to guarantee durability. If a Flink task manager node fails before a checkpoint completes, recent state can be lost. RisingWave's storage model separates compute from storage: compute nodes are stateless, and state is always in object storage, so node failures do not risk state loss.

The performance difference is meaningful. On the Nexmark streaming benchmark, RisingWave runs 22 of 27 queries faster than Flink. The Nexmark benchmark uses realistic e-commerce event patterns that closely resemble fraud detection workloads: bid velocity, item tracking, and auction aggregations map directly to transaction velocity, merchant tracking, and account aggregations.

For fraud detection specifically, the practical advantage is rule agility. A RisingWave fraud rule is a CREATE MATERIALIZED VIEW statement. Updating a rule is an ALTER or a DROP followed by a CREATE. The change takes effect in seconds, not minutes, and does not require a Java engineer or a cluster restart.

How do you implement a fraud detection pipeline step by step?

Step 1: Create the Kafka source

Connect RisingWave to your transaction Kafka topic. Each message is a JSON object with transaction fields.

CREATE SOURCE transactions (
    transaction_id  VARCHAR,
    card_id         VARCHAR,
    merchant_id     VARCHAR,
    merchant_cat    VARCHAR,
    amount          DECIMAL,
    currency        VARCHAR,
    merchant_lat    DOUBLE,
    merchant_lon    DOUBLE,
    device_id       VARCHAR,
    ip_address      VARCHAR,
    event_time      TIMESTAMPTZ
)
WITH (
    connector        = 'kafka',
    topic            = 'payment.transactions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Step 2: Create a velocity check materialized view

Count transactions and sum amounts per card over a one-minute tumbling window.

CREATE MATERIALIZED VIEW card_velocity_1m AS
SELECT
    card_id,
    window_start,
    window_end,
    COUNT(*)                    AS txn_count,
    SUM(amount)                 AS total_amount,
    COUNT(DISTINCT merchant_id) AS distinct_merchants
FROM TUMBLE(transactions, event_time, INTERVAL '1 minute')
GROUP BY
    card_id,
    window_start,
    window_end;

A card with txn_count > 5 or total_amount > 2000 within one minute is a candidate for automatic decline or step-up authentication.

Step 3: Join the stream with a blacklist table

Create a reference table of compromised card numbers and join it to the live stream.

CREATE TABLE compromised_cards (
    card_id      VARCHAR PRIMARY KEY,
    flagged_at   TIMESTAMPTZ,
    reason       VARCHAR
);

CREATE MATERIALIZED VIEW flagged_transactions AS
SELECT
    t.transaction_id,
    t.card_id,
    t.amount,
    t.merchant_id,
    t.event_time,
    c.reason        AS flag_reason
FROM transactions t
JOIN compromised_cards c ON t.card_id = c.card_id;

When your fraud operations team adds a card to compromised_cards, the join result updates immediately. No job restart required.

Step 4: Query from your payment gateway

Your authorization service queries the materialized views synchronously during the authorization flow.

SELECT
    v.txn_count,
    v.total_amount,
    f.flag_reason
FROM card_velocity_1m v
LEFT JOIN flagged_transactions f
    ON v.card_id = f.card_id
WHERE v.card_id = $1
  AND v.window_end = date_trunc('minute', now());

Because RisingWave is PostgreSQL-compatible, any language with a Postgres driver can connect directly using the standard wire protocol. No custom SDK, no REST API layer.

Step 5: Add alerting and sinks

Materialize a high-risk alert view and sink it to another Kafka topic or to an Apache Iceberg table for long-term storage and audit.

CREATE SINK fraud_alerts_sink
FROM flagged_transactions
WITH (
    connector = 'kafka',
    topic     = 'fraud.alerts',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Frequently asked questions

Can RisingWave process transactions with sub-second latency?

Yes. Materialized views in RisingWave are incrementally maintained as each event arrives from Kafka. Query latency is typically in the single-digit millisecond range because the authorization service reads pre-computed results rather than triggering a fresh computation on each query.

How does RisingWave handle state durability?

RisingWave stores all materialized view state in S3-compatible object storage. Compute nodes are stateless and can be restarted or scaled without risk of state loss. This is different from Apache Flink, where hot state lives in RocksDB on local disk and requires checkpoint configuration to ensure durability.

Can I update a fraud rule without downtime?

Yes. Because each fraud rule is a SQL materialized view, you can drop and recreate it, or add new views, while the system continues processing. There is no job redeployment, no cluster restart, and no savepoint management.

What data sources does RisingWave support beyond Kafka?

RisingWave supports Kafka, Apache Pulsar, Amazon Kinesis, PostgreSQL CDC (via logical replication), and MySQL CDC as native streaming sources. This means you can ingest transaction events from any of these systems and join them with reference data replicated from your operational database using CDC.

If you want to run a fraud detection pipeline without managing infrastructure, RisingWave Cloud provides a fully managed service with a free tier. You can connect your Kafka cluster, define your materialized views in the SQL editor, and have a working pipeline in under an hour.