How to Build Real-Time Fraud Detection with RisingWave

How to Build Real-Time Fraud Detection with RisingWave

Introduction

A stolen credit card is used three times in two minutes across three different countries. By the time a nightly batch job flags the anomaly, the fraudster has already cashed out. According to the FTC, consumers reported losing over $10 billion to fraud in 2023 alone -and the number keeps climbing.

The core problem is latency. Traditional fraud detection systems rely on batch pipelines that run every few hours or overnight. That gap between a fraudulent transaction and detection is where the damage happens. Real-time fraud detection closes that gap by evaluating every transaction as it arrives, applying rules and statistical models within milliseconds.

Building such a system used to require deep expertise in Java-based stream processors like Apache Flink or Kafka Streams. But what if you could express your fraud detection logic in standard SQL, against a system that processes streams incrementally? That is exactly what RisingWave enables. In this guide, you will build a complete real-time fraud detection pipeline -from ingesting transaction events, to scoring risk with materialized views, to routing alerts downstream -using nothing but SQL.

What Is Real-Time Fraud Detection?

Real-time fraud detection is the practice of evaluating transactions and user actions for fraudulent patterns the moment they occur, rather than in periodic batches. A real-time system ingests events continuously, applies detection rules against each event (and its surrounding context), and produces a risk decision before the transaction settles.

The typical architecture has three layers:

  1. Event ingestion -Transaction events flow in from payment gateways, mobile apps, or point-of-sale terminals via a message broker like Apache Kafka.
  2. Stream processing -A stateful engine evaluates each event against detection rules, aggregating context (velocity, geolocation, amount patterns) in real time.
  3. Action -Flagged transactions are routed to a case management system, trigger automated blocks, or feed back into a machine learning scoring service.

RisingWave sits in the stream processing layer. It connects directly to Kafka (or Kinesis, Pulsar, and other sources), maintains state through materialized views, and exposes results via PostgreSQL-compatible queries or downstream sinks.

graph LR
    A[Payment Gateway] -->|transactions| B[Apache Kafka]
    B -->|streaming| C[RisingWave]

    subgraph RisingWave
        D[Source: transactions] --> E[MV: velocity check]
        D --> F[MV: geo anomaly]
        E --> G[MV: risk score]
        F --> G
    end

    G -->|high-risk alerts| H[Kafka: fraud.alerts]
    G -->|SQL queries| I[Dashboard / Grafana]
    H --> J[Auto-block Service]
    H --> K[Case Management]

Figure 1: High-level architecture -transaction events flow from Kafka into RisingWave, where materialized views evaluate fraud rules continuously. Alerts are routed to downstream systems via sinks.

Setting Up the Transaction Pipeline

This section walks through building the pipeline step by step. All SQL is compatible with RisingWave v2.2+.

Step 1: Create the Transaction Source

First, connect RisingWave to a Kafka topic that carries raw transaction events. Each event includes the transaction amount, merchant, location, and a timestamp.

CREATE SOURCE transactions_source (
    transaction_id VARCHAR,
    user_id VARCHAR,
    amount DECIMAL,
    currency VARCHAR,
    merchant_id VARCHAR,
    merchant_category VARCHAR,
    card_number VARCHAR,
    transaction_country VARCHAR,
    ip_address VARCHAR,
    event_time TIMESTAMP
)
WITH (
    connector = 'kafka',
    topic = 'payment.transactions',
    properties.bootstrap.server = 'broker:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

This creates a streaming source. RisingWave consumes messages from the payment.transactions topic and makes them available for downstream queries. The scan.startup.mode = 'latest' setting means it starts reading from the newest offset -useful in production to avoid reprocessing history on restart.

For a quick local test without Kafka, you can use a regular table instead:

CREATE TABLE transactions (
    transaction_id VARCHAR PRIMARY KEY,
    user_id VARCHAR NOT NULL,
    amount DECIMAL NOT NULL,
    currency VARCHAR,
    merchant_id VARCHAR,
    merchant_category VARCHAR,
    card_number VARCHAR,
    transaction_country VARCHAR,
    ip_address VARCHAR,
    event_time TIMESTAMP NOT NULL
);

Step 2: Seed Test Data

Insert realistic test data to validate the pipeline before connecting to live Kafka:

INSERT INTO transactions VALUES
    ('txn_001', 'user_42', 12.50, 'USD', 'merch_amazon', 'retail', '4111-XXXX-1234', 'US', '192.168.1.10', '2025-01-15 10:00:00'),
    ('txn_002', 'user_42', 8500.00, 'USD', 'merch_electronics', 'electronics', '4111-XXXX-1234', 'US', '192.168.1.10', '2025-01-15 10:00:30'),
    ('txn_003', 'user_42', 9200.00, 'USD', 'merch_jewelry', 'luxury', '4111-XXXX-1234', 'RO', '45.89.12.3', '2025-01-15 10:01:15'),
    ('txn_004', 'user_42', 75.00, 'USD', 'merch_gas', 'fuel', '4111-XXXX-1234', 'RO', '45.89.12.3', '2025-01-15 10:01:45'),
    ('txn_005', 'user_42', 6800.00, 'USD', 'merch_wire', 'transfer', '4111-XXXX-1234', 'NG', '102.22.44.5', '2025-01-15 10:02:10'),
    ('txn_006', 'user_99', 15.00, 'USD', 'merch_coffee', 'food', '5222-XXXX-5678', 'US', '10.0.0.5', '2025-01-15 10:00:00'),
    ('txn_007', 'user_99', 42.00, 'USD', 'merch_grocery', 'grocery', '5222-XXXX-5678', 'US', '10.0.0.5', '2025-01-15 10:15:00'),
    ('txn_008', 'user_99', 28.00, 'USD', 'merch_pharmacy', 'health', '5222-XXXX-5678', 'US', '10.0.0.5', '2025-01-15 10:30:00');

Notice the pattern: user_42 has a classic fraud signature -a small test transaction followed by large purchases across multiple countries in under three minutes. user_99 shows normal, steady behavior.

Detecting Fraud Patterns with Materialized Views

Materialized views in RisingWave are incrementally maintained -they update automatically as new data arrives without requiring a full recomputation. This makes them ideal for continuous fraud scoring. For a deeper look at how stateful stream processing works, see how RisingWave gives stream processing a "memory".

Pattern 1: High-Velocity Spending

The first rule catches users who spend an unusually high amount within a short time window. We use the TUMBLE window function to bucket transactions into 5-minute intervals:

CREATE MATERIALIZED VIEW mv_high_velocity_spending AS
SELECT
    user_id,
    card_number,
    window_start,
    window_end,
    COUNT(*) AS txn_count,
    SUM(amount) AS total_amount,
    MAX(amount) AS max_single_txn,
    COUNT(DISTINCT transaction_country) AS distinct_countries
FROM TUMBLE(transactions, event_time, INTERVAL '5 MINUTES')
GROUP BY user_id, card_number, window_start, window_end
HAVING SUM(amount) > 5000 OR COUNT(*) > 5;

This view continuously tracks spending velocity. Any 5-minute window where a user either spends more than $5,000 total or makes more than 5 transactions gets flagged. The distinct_countries column also captures geographic dispersion -a strong fraud signal.

Query the results:

SELECT * FROM mv_high_velocity_spending;

Expected output:

 user_id | card_number     | window_start        | window_end          | txn_count | total_amount | max_single_txn | distinct_countries
---------+-----------------+---------------------+---------------------+-----------+--------------+----------------+--------------------
 user_42 | 4111-XXXX-1234  | 2025-01-15 10:00:00 | 2025-01-15 10:05:00 |         5 |     24587.50 |        9200.00 |                  3

user_42 hits both thresholds: $24,587.50 total across 3 countries in one window. user_99 does not appear because their spending is well under the limits.

Pattern 2: Rapid Geographic Shifts

Transactions from multiple countries within minutes are a strong indicator of card-not-present fraud. This pattern uses LAG() to compare consecutive transactions per user:

CREATE MATERIALIZED VIEW mv_geo_anomalies AS
SELECT
    transaction_id,
    user_id,
    card_number,
    amount,
    transaction_country,
    event_time,
    prev_country,
    prev_event_time,
    event_time - prev_event_time AS time_gap
FROM (
    SELECT
        transaction_id,
        user_id,
        card_number,
        amount,
        transaction_country,
        event_time,
        LAG(transaction_country) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) AS prev_country,
        LAG(event_time) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) AS prev_event_time
    FROM transactions
)
WHERE prev_country IS NOT NULL
  AND transaction_country <> prev_country
  AND event_time - prev_event_time < INTERVAL '10 MINUTES';

This flags any transaction where the user's country changed from the previous transaction within a 10-minute window.

SELECT transaction_id, user_id, transaction_country, prev_country, time_gap
FROM mv_geo_anomalies;

Expected output:

 transaction_id | user_id | transaction_country | prev_country | time_gap
----------------+---------+---------------------+--------------+----------
 txn_003        | user_42 | RO                  | US           | 00:00:45
 txn_005        | user_42 | NG                  | RO           | 00:00:25

Two rapid geographic shifts in under two minutes -a clear fraud signal.

Pattern 3: Composite Risk Score

Production systems combine multiple signals. This materialized view merges the velocity and geographic signals into a single risk score:

CREATE MATERIALIZED VIEW mv_fraud_risk_score AS
SELECT
    t.user_id,
    t.card_number,
    t.transaction_id,
    t.amount,
    t.transaction_country,
    t.event_time,
    COALESCE(v.total_amount, 0) AS window_total,
    COALESCE(v.txn_count, 0) AS window_txn_count,
    COALESCE(v.distinct_countries, 0) AS window_countries,
    CASE WHEN g.transaction_id IS NOT NULL THEN TRUE ELSE FALSE END AS has_geo_anomaly,
    -- Risk score: 0-100
    LEAST(100,
        (CASE WHEN t.amount > 3000 THEN 30 ELSE 0 END) +
        (CASE WHEN COALESCE(v.total_amount, 0) > 10000 THEN 25 ELSE 0 END) +
        (CASE WHEN COALESCE(v.txn_count, 0) > 4 THEN 15 ELSE 0 END) +
        (CASE WHEN COALESCE(v.distinct_countries, 0) > 1 THEN 20 ELSE 0 END) +
        (CASE WHEN g.transaction_id IS NOT NULL THEN 30 ELSE 0 END)
    ) AS risk_score
FROM transactions t
LEFT JOIN mv_high_velocity_spending v
    ON t.user_id = v.user_id
    AND t.card_number = v.card_number
    AND t.event_time >= v.window_start
    AND t.event_time < v.window_end
LEFT JOIN mv_geo_anomalies g
    ON t.transaction_id = g.transaction_id;

Each rule contributes points to a 0-100 risk score:

SignalPointsRationale
Single transaction > $3,00030High-value transactions carry inherent risk
Window total > $10,00025Rapid burn-through of a card limit
More than 4 transactions in 5 min15Automated card testing pattern
Multiple countries in window20Physical impossibility for card-present
Country changed from prior txn30Strongest individual fraud signal

Query the scores:

SELECT transaction_id, user_id, amount, risk_score, has_geo_anomaly
FROM mv_fraud_risk_score
WHERE risk_score > 0
ORDER BY risk_score DESC;

Expected output:

 transaction_id | user_id | amount  | risk_score | has_geo_anomaly
----------------+---------+---------+------------+-----------------
 txn_003        | user_42 | 9200.00 |        100 | t
 txn_005        | user_42 | 6800.00 |        100 | t
 txn_002        | user_42 | 8500.00 |         90 | f
 txn_004        | user_42 |   75.00 |         60 | f
 txn_001        | user_42 |   12.50 |         60 | f

user_99's transactions all score 0 -correctly identified as legitimate.

Routing Alerts Downstream

Detecting fraud is only half the job. You need to route high-risk transactions to downstream systems for action -a case management dashboard, an automated card-blocking service, or a Slack channel for the fraud operations team.

RisingWave supports sinking data to 18+ destinations, including Kafka, PostgreSQL, Elasticsearch, and Redis.

Sink High-Risk Alerts to Kafka

CREATE SINK sink_fraud_alerts
FROM mv_fraud_risk_score
WITH (
    connector = 'kafka',
    topic = 'fraud.alerts',
    properties.bootstrap.server = 'broker:9092'
)
FORMAT PLAIN ENCODE JSON;

Every time a new transaction flows in and produces a risk score, the updated result is pushed to the fraud.alerts Kafka topic. Downstream consumers -an automated blocking service, a dashboard, or an ML re-scoring pipeline -can subscribe independently.

Query High-Risk Transactions Directly

Because RisingWave is PostgreSQL-compatible, you can also query materialized views directly from any application using a standard PostgreSQL driver:

-- Find all high-risk transactions in the last hour
SELECT transaction_id, user_id, amount, risk_score, event_time
FROM mv_fraud_risk_score
WHERE risk_score >= 70
ORDER BY event_time DESC;

This is useful for building a fraud analyst dashboard with tools like Grafana, Metabase, or any BI tool that speaks PostgreSQL.

Why RisingWave for Fraud Detection?

Fraud detection is a core use case for stream processing, and several tools can handle it. Here is how RisingWave compares to alternatives:

CapabilityRisingWaveApache FlinkksqlDBMaterialize
Query languagePostgreSQL-compatible SQLSQL + Java/Scala DataStream APIKSQL (Kafka-only)PostgreSQL-compatible SQL
State managementAutomatic, built-inManual checkpoint configLimitedAutomatic
Kafka integrationNative source/sinkNativeNative (Kafka-only)Native source/sink
Additional infrastructureNone (self-contained)Requires ZooKeeper/JobManagerRequires Kafka clusterNone (self-contained)
Materialized view joinsFull SQL joinsRequires custom codeLimitedFull SQL joins
Learning curveLow (just SQL)High (Java/Scala + SQL)Medium (KSQL dialect)Low (just SQL)

The key differentiator: RisingWave lets you express complex fraud rules -multi-table joins, window aggregations, layered materialized views -in standard SQL. There is no secondary language, no JAR deployment, no cluster manager to configure. If you know PostgreSQL, you already know how to use RisingWave.

For a deeper comparison of stream processing architectures, see From Lambda to Kappa: The Evolution of Stream Processing Systems.

Production Considerations

Tuning Latency and Throughput

RisingWave's barrier-based checkpointing controls the trade-off between latency and throughput. The default barrier interval is 1 second -meaning materialized views update at most once per second. For fraud detection, this is usually sufficient. If you need sub-second latency, you can reduce the barrier interval at the cost of higher overhead.

Scaling the Pipeline

RisingWave distributes computation across parallel workers. As transaction volume grows, you can add compute nodes and RisingWave automatically redistributes the workload. The materialized views in this guide use user_id and card_number as partition keys, which provides natural parallelism since different users are processed independently.

Evolving Rules

One of the strongest advantages of a SQL-based approach is that fraud analysts can modify rules without redeploying application code. Adding a new detection pattern is as simple as creating a new materialized view:

-- New rule: flag transactions at merchants with high chargeback rates
CREATE MATERIALIZED VIEW mv_risky_merchants AS
SELECT
    t.transaction_id,
    t.user_id,
    t.merchant_id,
    t.amount,
    t.event_time,
    m.chargeback_rate
FROM transactions t
JOIN merchant_risk_profiles m ON t.merchant_id = m.merchant_id
WHERE m.chargeback_rate > 0.05;

No redeploy. No build pipeline. The new view starts producing results as soon as it is created.

FAQ

What is real-time fraud detection?

Real-time fraud detection is the process of analyzing financial transactions and user actions for fraudulent patterns as they occur, typically within milliseconds to seconds. Unlike batch systems that run on a schedule (hourly or daily), a real-time system evaluates each event the moment it arrives, enabling immediate blocking or flagging before the transaction settles.

RisingWave and Apache Flink both support stateful stream processing, but they differ in developer experience. Flink requires Java or Scala for complex logic and involves managing job clusters, checkpoints, and savepoints. RisingWave expresses the same logic in PostgreSQL-compatible SQL with automatic state management. For teams that already know SQL but lack JVM expertise, RisingWave significantly reduces time-to-production. For a hands-on comparison, see Flink's fraud detection tutorial which requires roughly 200 lines of Java -versus the ~40 lines of SQL shown in this post.

When should I use streaming SQL instead of a batch ML model for fraud detection?

Use streaming SQL for rule-based detection that needs sub-second response times -velocity checks, geographic anomalies, amount thresholds, and pattern matching. Use batch ML models for complex behavioral analysis that tolerates higher latency (minutes to hours). Most production systems combine both: streaming SQL handles the first line of defense (catching obvious fraud patterns instantly), while ML models run in a separate pipeline for deeper behavioral scoring and feed results back as enrichment data.

Can RisingWave handle the transaction volume of a large payment processor?

Yes. RisingWave scales horizontally by adding compute nodes, and it distributes materialized view computation across workers using partition keys. The stateful operators use a shared storage backend (S3 or compatible object storage), which decouples compute from storage. Organizations processing millions of events per second can scale the compute tier independently. See the RisingWave architecture documentation for details on deployment options.

Conclusion

Key takeaways from this guide:

  • Real-time fraud detection requires stream processing -batch pipelines are too slow to prevent losses before they occur.
  • RisingWave turns fraud rules into SQL -materialized views express velocity checks, geographic anomalies, and composite risk scores without Java or specialized DSLs.
  • Layered detection is powerful -combining multiple materialized views (velocity, geolocation, composite scoring) produces high-quality alerts with few false positives.
  • Downstream routing is built in -Kafka sinks, PostgreSQL compatibility, and 18+ connectors let you plug fraud alerts into any operational system.
  • Rules evolve without deploys -new fraud patterns become new materialized views, created with a single SQL statement.

Ready to try this yourself? Try RisingWave Cloud free -no credit card required. Sign up →

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.