CDC for Real-Time Fraud Detection: From Debezium to Streaming SQL

CDC for Real-Time Fraud Detection: From Debezium to Streaming SQL

CDC for Real-Time Fraud Detection: From Debezium to Streaming SQL

Real-time fraud detection requires watching transaction tables for suspicious patterns the moment they occur. The classic stack — Debezium → Kafka → Flink feature computation → alert — works well but carries significant operational weight. RisingWave can replace that entire pipeline with CDC source tables, streaming materialized views for velocity checks, and a Kafka sink for alerts.


Why CDC is the Right Foundation for Fraud Detection

Fraud detection cannot rely on batch jobs. A compromised card can make dozens of transactions in minutes. By the time a nightly batch runs, the damage is done.

CDC captures every transaction insert at the PostgreSQL WAL level — sub-second after commit. This gives fraud models the freshest possible view of transaction velocity, merchant patterns, and account behavior, without the complexity of application-level event publishing.


PostgreSQL transactions table (WAL)
     │
     ▼
Debezium (Kafka Connect Source)
     │
     ▼
Kafka: cdc.transactions
     │
     ▼
Apache Flink (velocity feature computation)
  ├── Count transactions per card, last 1 min / 5 min / 1 hour
  ├── Sum amounts per card, last window
  ├── Distinct merchant count per card
  └── Geographic velocity (>2 countries in 10 min)
     │
     ▼
Kafka: fraud.alerts
     │
     ▼
Alert Service (blocks card, notifies user)

This pipeline is proven at scale. Flink provides exactly-once semantics, watermark-based windowing for out-of-order events, and a rich ecosystem of ML feature serving integrations.

The downside: you are operating Kafka, Kafka Connect, and a Flink cluster. Each Flink job requires Java development, checkpoint tuning, and a dedicated operations team.


The RisingWave Pattern: CDC + Streaming SQL for Fraud

RisingWave replaces the Debezium + Flink layers with a single streaming SQL pipeline. The Debezium Embedded Engine runs internally — you write SQL, not connector configs.

Step 1: Create the Transactions CDC Source

CREATE TABLE transactions (
    id              BIGINT PRIMARY KEY,
    card_id         VARCHAR,
    merchant_id     VARCHAR,
    merchant_country VARCHAR,
    amount          NUMERIC(12,2),
    status          VARCHAR,
    txn_at          TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname  = 'postgres',
    port      = '5432',
    username  = 'rwuser',
    password  = 'secret',
    database.name = 'payments',
    schema.name   = 'public',
    table.name    = 'transactions'
);

Step 2: Velocity Check — Transaction Count per Card

This materialized view counts how many transactions each card has made in the last 1 minute and 5 minutes. RisingWave maintains these counts incrementally — every new transaction updates the relevant card's row in milliseconds.

CREATE MATERIALIZED VIEW card_velocity_1min AS
SELECT
    card_id,
    COUNT(*)        AS txn_count_1min,
    SUM(amount)     AS total_amount_1min,
    MAX(txn_at)     AS last_txn_at
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '1 minute'
  AND status != 'declined'
GROUP BY card_id;

CREATE MATERIALIZED VIEW card_velocity_5min AS
SELECT
    card_id,
    COUNT(*)        AS txn_count_5min,
    SUM(amount)     AS total_amount_5min,
    COUNT(DISTINCT merchant_id) AS distinct_merchants_5min
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '5 minutes'
  AND status != 'declined'
GROUP BY card_id;

Step 3: Geographic Velocity Check

Multiple countries within a short window is a strong fraud signal.

CREATE MATERIALIZED VIEW card_geo_velocity AS
SELECT
    card_id,
    COUNT(DISTINCT merchant_country)   AS country_count_10min,
    ARRAY_AGG(DISTINCT merchant_country) AS countries_seen,
    MAX(txn_at)                        AS last_seen_at
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '10 minutes'
GROUP BY card_id;

Step 4: Combine Signals into a Fraud Score View

CREATE MATERIALIZED VIEW fraud_signals AS
SELECT
    t.id           AS txn_id,
    t.card_id,
    t.merchant_id,
    t.amount,
    t.txn_at,
    -- Velocity features
    COALESCE(v1.txn_count_1min, 0)         AS txn_count_1min,
    COALESCE(v1.total_amount_1min, 0)      AS total_amount_1min,
    COALESCE(v5.txn_count_5min, 0)         AS txn_count_5min,
    COALESCE(v5.distinct_merchants_5min, 0) AS distinct_merchants_5min,
    COALESCE(g.country_count_10min, 1)     AS country_count_10min,
    -- Rule-based fraud score (0-100)
    LEAST(100,
        CASE WHEN v1.txn_count_1min > 5   THEN 40 ELSE 0 END +
        CASE WHEN v1.total_amount_1min > 2000 THEN 30 ELSE 0 END +
        CASE WHEN v5.distinct_merchants_5min > 4 THEN 20 ELSE 0 END +
        CASE WHEN g.country_count_10min > 1 THEN 50 ELSE 0 END
    ) AS fraud_score
FROM transactions t
LEFT JOIN card_velocity_1min  v1 ON t.card_id = v1.card_id
LEFT JOIN card_velocity_5min  v5 ON t.card_id = v5.card_id
LEFT JOIN card_geo_velocity    g ON t.card_id = g.card_id
WHERE t.txn_at >= NOW() - INTERVAL '30 seconds';

Step 5: Publish High-Score Alerts to Kafka

CREATE SINK fraud_alerts_kafka
FROM (
    SELECT
        txn_id,
        card_id,
        merchant_id,
        amount,
        txn_at,
        fraud_score,
        txn_count_1min,
        total_amount_1min,
        country_count_10min
    FROM fraud_signals
    WHERE fraud_score >= 70
)
WITH (
    connector = 'kafka',
    topic     = 'fraud.high_risk_alerts',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON;

Your alert service subscribes to fraud.high_risk_alerts. High-confidence fraud signals arrive within 1–2 seconds of the suspicious transaction committing to PostgreSQL.


Adding a Blacklist Check

Real fraud pipelines also check against known bad actors. RisingWave can join the live transaction stream against a blacklist table maintained in PostgreSQL:

-- Blacklist table also synced via CDC
CREATE TABLE card_blacklist (
    card_id        VARCHAR PRIMARY KEY,
    reason         VARCHAR,
    blocked_at     TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname  = 'postgres',
    database.name = 'payments',
    schema.name   = 'public',
    table.name    = 'card_blacklist'
);

-- Enrich fraud signals with blacklist status
CREATE MATERIALIZED VIEW fraud_signals_enriched AS
SELECT
    fs.*,
    bl.reason IS NOT NULL AS is_blacklisted,
    bl.reason             AS blacklist_reason
FROM fraud_signals fs
LEFT JOIN card_blacklist bl ON fs.card_id = bl.card_id;

This join is maintained incrementally — adding a card to the blacklist immediately affects all open fraud signal rows for that card.


Comparison Table

FeatureDebezium + Kafka + FlinkRisingWave Streaming SQL
Velocity windows (1min, 5min)Flink event-time windowsMaterialized views with WHERE txn_at >= NOW() - INTERVAL
Multi-table enrichment (joins)Flink stream-stream or broadcast joinStandard SQL JOIN
Alert routingKafka Streams or Flink sinkKafka sink with WHERE filter
Rule changesRedeploy Flink jobALTER or recreate materialized view
ML feature servingFlink → feature storeQuery materialized view directly
Operational complexityHigh (Flink cluster, jobs, checkpoints)Low (SQL DDL)
Out-of-order event handlingFlink watermarksAppend-only; late data handled by window bounds

Stick with Flink-based fraud pipelines when:

  • Your fraud models are complex ML models that require Flink's ML integration or external feature store writes.
  • You need precise event-time windowing with watermarks for out-of-order transaction events arriving from distributed payment processors.
  • The team already runs Flink at scale and has Java expertise.
  • You need millisecond-level latency targets (Flink's low-level API can beat RisingWave's materialized view refresh in extreme cases).

FAQ

Q: How accurate are RisingWave's time windows for velocity checks? RisingWave uses processing time for NOW() comparisons. For most fraud use cases — where transactions are committed within seconds of occurring — this is sufficient. If transactions can arrive significantly delayed (minutes to hours after the actual event time), Flink's event-time windowing with watermarks provides more accurate window semantics.

Q: Can I integrate RisingWave fraud signals with an ML model? Yes, in two ways. First, query the fraud_signals materialized view from a Python scoring service — it returns current feature values with millisecond latency. Second, use RisingWave's HTTP sink to push signals to an ML inference endpoint as they are generated.

Q: What happens to velocity counts if RisingWave restarts? RisingWave persists materialized view state to S3. On restart, it recovers from the last checkpoint and replays recent WAL events. Velocity windows recover fully — there is no cold-start data loss.

Q: Can RisingWave block a card in real time, not just alert? RisingWave is a streaming analytics system, not a transaction interceptor. The correct pattern is: RisingWave publishes to fraud.alerts → alert service consumes and writes to a blocked_cards table → application code checks that table before authorizing transactions.

Q: How do I tune the fraud score thresholds? Because the scoring logic lives in a SQL materialized view, threshold changes are a simple CREATE OR REPLACE MATERIALIZED VIEW operation. No Flink job redeployment needed. Changes take effect on the next incremental update cycle.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.