Build a Real-Time Fraud Detection System Using RisingWave

Overview

The need for real-time fraud detection

As online transactions continue to rise, merchants and payment service providers face a growing challenge: detecting and preventing fraudulent activities in real-time. Traditional fraud detection systems, which analyze transactions hours or even days after they occur, are no longer sufficient. With the increasing velocity of online transactions, the need for real-time fraud detection has become more pressing than ever.

The complexity of building real-time fraud-detection systems

However, building a real-time fraud detection system has historically been a complex and costly endeavor. Typically, organizations rely on a patchwork of multiple technologies, including data collection and ingestion, data transformation, detection mechanisms, and response and alerting logic. This multi-component architecture can be a nightmare to integrate, requires specialized expertise to maintain, and often struggles to scale as transaction volumes surge.

Simplifying real-time data processing with RisingWave

RisingWave, a streaming database, offers a simpler approach to real-time data processing. By providing a unified platform, developers can use familiar SQL to ingest transaction streams in real-time, define fraud detection logic, write alerting rules, and output suspicious transaction data to downstream applications. This streamlined approach eliminates the complexity and integration headaches associated with traditional architectures.

In this article, we'll take a step-by-step approach to building a real-time fraud detection system using RisingWave.

Architecture

The system include the following components:

A Kafka topic that contains a real-time transaction stream.
RisingWave, a streaming database that performs these functions:
- Ingests data from the Kafka topic, processing transactions as they occur.
- Filters and aggregates data in real-time, applying rules to identify suspicious patterns.
- Store the results of its analysis, providing a historical record of transactions and detected fraud.
- Stream identified fraudulent transactions to another Kafka topic.
Another Kafka topic that received the stream of fraudulent transactions from RisingWave. This topic can trigger downstream actions such as stopping the fraud transactions and/or sending alerts.

With this architecture in place, we can now explore a concrete example of how to build a real-time fraud detection system using RisingWave.

Step 1: Ingesting transaction stream into RisingWave

Assume that all transaction events are stored in the transaction_stream Kafka topic. The data in this topic looks like this:

[
  {
    "user_id": "user123",
    "transaction_id": "tx001",
    "amount": 500,
    "location": "New York",
    "timestamp": "2023-05-01T10:00:00Z"
  },
  {
    "user_id": "user123",
    "transaction_id": "tx002",
    "amount": 15000,
    "location": "New York",
    "timestamp": "2023-05-01T10:05:00Z"
  },
  {
    "user_id": "user456",
    "transaction_id": "tx003",
    "amount": 100,
    "location": "Los Angeles",
    "timestamp": "2023-05-01T10:10:00Z"
  },
  {
    "user_id": "user456",
    "transaction_id": "tx004",
    "amount": 200,
    "location": "New York",
    "timestamp": "2023-05-01T10:15:00Z"
  }
]

Now let’s connect RisingWave to this Kafka topic as a consumer via RisingWave’s built-in Kafka source connector. In RisingWave, data can be ingested via either CREATE SOURCE or CREATE TABLE command. The difference is the full data is persisted in RisingWave if CREATE TABLE is used. In our scenario, we don’t want to persist the full data. Therefore, we’ll just use CREATE SOURCE.

CREATE SOURCE transaction_stream (
    user_id VARCHAR,
    transaction_id VARCHAR,
    amount DECIMAL,
    location VARCHAR,
    timestamp TIMESTAMP
) WITH (
    connector = 'kafka',
    topic = 'transaction_stream',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest',
    format = 'json'
);

Step 2: Identifying fraud transactions with SQL

Next, we’ll create materialized views in RisingWave to identify fraudulent transactions. Materialized views in RisingWave employ incremental computation and always maintain the latest results. These latest results are immediately queryable. We can stream these results to other systems for further action.

We’ll define four materialized views to identify four common patterns of fraudulent activities. These materialized views will serve as the foundation for our real-time fraud detection system, allowing us to monitor and respond to suspicious transactions as they occur.

Pattern 1: Multiple transactions in a short time

This pattern detects users who perform an unusually high number of transactions within a short time frame.

We use the tumbling time window function to segment data into 5-minute intervals, and then filter out users who have:

Made more than 5 transactions within a single 5-minute window
Exceeded a total transaction volume of 20,000 within a single 5-minute window

CREATE MATERIALIZED VIEW velocity_fraud_alerts AS
SELECT
    user_id,
    COUNT(*) as tx_count,
    SUM(amount) as total_amount,
    MIN(transaction_id) as first_tx_id,
    MAX(timestamp) as latest_timestamp
FROM TUMBLE(transaction_stream, timestamp, INTERVAL '5 minutes')
GROUP BY
    user_id
HAVING COUNT(*) > 5 OR SUM(amount) > 20000;

Pattern 2: Impossible travel: Multiple locations in a short time frame

This pattern identifies users who have made transactions from multiple locations within a short period, indicating potentially suspicious travel patterns. We use the tumbling time window function to analyze transactions within a 2-hour window, and flag users who have made transactions from 3 or more locations within a 2-hour window.

CREATE MATERIALIZED VIEW location_fraud_alerts AS
SELECT
    user_id,
    COUNT(DISTINCT location) as location_count,
    array_agg(transaction_id) as transaction_ids,
    array_agg(location) as locations
FROM TUMBLE(transaction_stream, timestamp, INTERVAL '2 hours')
GROUP BY
    user_id
HAVING COUNT(DISTINCT location) >= 3;

Pattern 3: Unusual hour transactions with high amounts

This pattern detects transactions with unusually high amounts that occur during non-peak hours. We flag transactions that meet the following criteria:

Have an amount greater than 5,000
Occur between 1 am and 5 am

CREATE MATERIALIZED VIEW time_pattern_fraud_alerts AS
SELECT
    transaction_id,
    user_id,
    amount,
    timestamp,
    location
FROM transaction_stream
WHERE
    EXTRACT(HOUR FROM timestamp) BETWEEN 1 AND 5
    AND amount > 5000;

Pattern 4: Split transactions (multiple small amounts)

This pattern identifies transactions that are small in individual amount but significant in total, and occur within a short time frame. We flag transactions that meet the following criteria:

Consist of five or more small transactions (each less than 2000) within a 15-minute window
Have a total amount that exceeds a suspicious threshold (8000)

CREATE MATERIALIZED VIEW split_transaction_alerts AS
WITH time_window AS (
    SELECT
        user_id,
        COUNT(*) as tx_count,
        SUM(amount) as total_amount,
        array_agg(transaction_id) as transaction_ids,
        MAX(amount) as max_amount,
        AVG(amount) as avg_amount
    FROM TUMBLE(transaction_stream, timestamp, INTERVAL '15 minutes')
    GROUP BY
        user_id
)
SELECT *
FROM time_window
WHERE
    -- Many small transactions
    tx_count >= 5
    -- Each transaction is relatively small
    AND max_amount < 2000
    -- But total amount is significant
    AND total_amount > 8000
    -- Average transaction size is suspiciously uniform
    AND ABS(avg_amount - (total_amount/tx_count)) < 100;

After these materialized views are created, we can insert some data to mock up the patterns described above.

-- Normal transaction
('user_1', 'tx_001', 100.00, 'New York', '2024-11-06 10:00:00'),

-- Velocity fraud pattern: multiple transactions in 5 minutes
('user_2', 'tx_002', 5000.00, 'Chicago', '2024-11-06 11:00:00'),
('user_2', 'tx_003', 4500.00, 'Chicago', '2024-11-06 11:02:00'),
('user_2', 'tx_004', 6000.00, 'Chicago', '2024-11-06 11:04:00'),

-- Location fraud pattern: impossible travel
('user_3', 'tx_005', 3000.00, 'New York', '2024-11-06 12:00:00'),
('user_3', 'tx_006', 2500.00, 'Los Angeles', '2024-11-06 12:30:00'),
('user_3', 'tx_007', 4000.00, 'Miami', '2024-11-06 13:00:00'),

-- Split transaction pattern
('user_4', 'tx_008', 1900.00, 'Boston', '2024-11-06 14:00:00'),
('user_4', 'tx_009', 1850.00, 'Boston', '2024-11-06 14:05:00'),
('user_4', 'tx_010', 1900.00, 'Boston', '2024-11-06 14:10:00'),
('user_4', 'tx_011', 1850.00, 'Boston', '2024-11-06 14:12:00'),
('user_4', 'tx_012', 1900.00, 'Boston', '2024-11-06 14:14:00'),

-- Late night high amount pattern
('user_5', 'tx_013', 7500.00, 'Seattle', '2024-11-06 03:30:00')

When we query the materialized views, we’ll get data matching to these patterns. For example, for split_transaction_alerts, we’ll get the following result.

SELECT * from split_transaction_alerts;
------
transaction_id  user_id   amount  timestamp              location
tx_013          user_5    7500    2024-11-06T03:30:00Z   Seattle

Step 3: Outputting fraud data to Kafka topics

For each of the materialized views, you can export the data to a Kafka topic, which can then be used to trigger follow-up actions and send alerts. Alternatively, you can define a UDF (User-Defined Function) in Python or Java in RisingWave to send alerts to related accounts.

CREATE SINK velocity_fraud_alerts_sink FROM velocity_fraud_alerts
WITH (
   connector='kafka',
   properties.bootstrap.server='localhost:9092',
   topic='velocity_alerts',
   properties.message.max.bytes = 2000
)
FORMAT PLAIN ENCODE JSON;

Improving your fraud detection system

As fraud patterns continue to evolve, it's essential to adapt your detection algorithms to stay ahead of potential threats. One approach is to leverage machine learning to identify and flag suspicious transactions. By using machine learning models, you can dynamically update your detection logic to capture new patterns and dismiss outdated ones. RisingWave is well-suited to support this advanced scenario, in which it can perform complex streaming aggregations and serve as an online feature store. We'll explore this topic in greater depth in a future blog post.

In this article, we've demonstrated how to build a core fraud detection system using RisingWave. With minimal setup, you can easily integrate these components into your existing technical stack and have a functional fraud detection solution up and running. > >

>

Ready to give RisingWave a try? The easiest way to get started is by signing up for RisingWave Cloud. We'd love to hear about the fraud detection systems you build and any feedback you may have. > >

>

If you would like to stay up to date on what RisingWave is up to, sign up for our monthly newsletter. Follow us on Twitter and LinkedIn, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts. > >

RisingWave Use Case: Real-Time Fraud Detection

Table of contents