CDC for Real-Time Fraud Detection: From Debezium to Streaming SQL
Real-time fraud detection requires watching transaction tables for suspicious patterns the moment they occur. The classic stack — Debezium → Kafka → Flink feature computation → alert — works well but carries significant operational weight. RisingWave can replace that entire pipeline with CDC source tables, streaming materialized views for velocity checks, and a Kafka sink for alerts.
Why CDC is the Right Foundation for Fraud Detection
Fraud detection cannot rely on batch jobs. A compromised card can make dozens of transactions in minutes. By the time a nightly batch runs, the damage is done.
CDC captures every transaction insert at the PostgreSQL WAL level — sub-second after commit. This gives fraud models the freshest possible view of transaction velocity, merchant patterns, and account behavior, without the complexity of application-level event publishing.
The Classic Pipeline: Debezium → Kafka → Flink → Alerts
PostgreSQL transactions table (WAL)
│
▼
Debezium (Kafka Connect Source)
│
▼
Kafka: cdc.transactions
│
▼
Apache Flink (velocity feature computation)
├── Count transactions per card, last 1 min / 5 min / 1 hour
├── Sum amounts per card, last window
├── Distinct merchant count per card
└── Geographic velocity (>2 countries in 10 min)
│
▼
Kafka: fraud.alerts
│
▼
Alert Service (blocks card, notifies user)
This pipeline is proven at scale. Flink provides exactly-once semantics, watermark-based windowing for out-of-order events, and a rich ecosystem of ML feature serving integrations.
The downside: you are operating Kafka, Kafka Connect, and a Flink cluster. Each Flink job requires Java development, checkpoint tuning, and a dedicated operations team.
The RisingWave Pattern: CDC + Streaming SQL for Fraud
RisingWave replaces the Debezium + Flink layers with a single streaming SQL pipeline. The Debezium Embedded Engine runs internally — you write SQL, not connector configs.
Step 1: Create the Transactions CDC Source
CREATE TABLE transactions (
id BIGINT PRIMARY KEY,
card_id VARCHAR,
merchant_id VARCHAR,
merchant_country VARCHAR,
amount NUMERIC(12,2),
status VARCHAR,
txn_at TIMESTAMPTZ
) WITH (
connector = 'postgres-cdc',
hostname = 'postgres',
port = '5432',
username = 'rwuser',
password = 'secret',
database.name = 'payments',
schema.name = 'public',
table.name = 'transactions'
);
Step 2: Velocity Check — Transaction Count per Card
This materialized view counts how many transactions each card has made in the last 1 minute and 5 minutes. RisingWave maintains these counts incrementally — every new transaction updates the relevant card's row in milliseconds.
CREATE MATERIALIZED VIEW card_velocity_1min AS
SELECT
card_id,
COUNT(*) AS txn_count_1min,
SUM(amount) AS total_amount_1min,
MAX(txn_at) AS last_txn_at
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '1 minute'
AND status != 'declined'
GROUP BY card_id;
CREATE MATERIALIZED VIEW card_velocity_5min AS
SELECT
card_id,
COUNT(*) AS txn_count_5min,
SUM(amount) AS total_amount_5min,
COUNT(DISTINCT merchant_id) AS distinct_merchants_5min
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '5 minutes'
AND status != 'declined'
GROUP BY card_id;
Step 3: Geographic Velocity Check
Multiple countries within a short window is a strong fraud signal.
CREATE MATERIALIZED VIEW card_geo_velocity AS
SELECT
card_id,
COUNT(DISTINCT merchant_country) AS country_count_10min,
ARRAY_AGG(DISTINCT merchant_country) AS countries_seen,
MAX(txn_at) AS last_seen_at
FROM transactions
WHERE txn_at >= NOW() - INTERVAL '10 minutes'
GROUP BY card_id;
Step 4: Combine Signals into a Fraud Score View
CREATE MATERIALIZED VIEW fraud_signals AS
SELECT
t.id AS txn_id,
t.card_id,
t.merchant_id,
t.amount,
t.txn_at,
-- Velocity features
COALESCE(v1.txn_count_1min, 0) AS txn_count_1min,
COALESCE(v1.total_amount_1min, 0) AS total_amount_1min,
COALESCE(v5.txn_count_5min, 0) AS txn_count_5min,
COALESCE(v5.distinct_merchants_5min, 0) AS distinct_merchants_5min,
COALESCE(g.country_count_10min, 1) AS country_count_10min,
-- Rule-based fraud score (0-100)
LEAST(100,
CASE WHEN v1.txn_count_1min > 5 THEN 40 ELSE 0 END +
CASE WHEN v1.total_amount_1min > 2000 THEN 30 ELSE 0 END +
CASE WHEN v5.distinct_merchants_5min > 4 THEN 20 ELSE 0 END +
CASE WHEN g.country_count_10min > 1 THEN 50 ELSE 0 END
) AS fraud_score
FROM transactions t
LEFT JOIN card_velocity_1min v1 ON t.card_id = v1.card_id
LEFT JOIN card_velocity_5min v5 ON t.card_id = v5.card_id
LEFT JOIN card_geo_velocity g ON t.card_id = g.card_id
WHERE t.txn_at >= NOW() - INTERVAL '30 seconds';
Step 5: Publish High-Score Alerts to Kafka
CREATE SINK fraud_alerts_kafka
FROM (
SELECT
txn_id,
card_id,
merchant_id,
amount,
txn_at,
fraud_score,
txn_count_1min,
total_amount_1min,
country_count_10min
FROM fraud_signals
WHERE fraud_score >= 70
)
WITH (
connector = 'kafka',
topic = 'fraud.high_risk_alerts',
properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON;
Your alert service subscribes to fraud.high_risk_alerts. High-confidence fraud signals arrive within 1–2 seconds of the suspicious transaction committing to PostgreSQL.
Adding a Blacklist Check
Real fraud pipelines also check against known bad actors. RisingWave can join the live transaction stream against a blacklist table maintained in PostgreSQL:
-- Blacklist table also synced via CDC
CREATE TABLE card_blacklist (
card_id VARCHAR PRIMARY KEY,
reason VARCHAR,
blocked_at TIMESTAMPTZ
) WITH (
connector = 'postgres-cdc',
hostname = 'postgres',
database.name = 'payments',
schema.name = 'public',
table.name = 'card_blacklist'
);
-- Enrich fraud signals with blacklist status
CREATE MATERIALIZED VIEW fraud_signals_enriched AS
SELECT
fs.*,
bl.reason IS NOT NULL AS is_blacklisted,
bl.reason AS blacklist_reason
FROM fraud_signals fs
LEFT JOIN card_blacklist bl ON fs.card_id = bl.card_id;
This join is maintained incrementally — adding a card to the blacklist immediately affects all open fraud signal rows for that card.
Comparison Table
| Feature | Debezium + Kafka + Flink | RisingWave Streaming SQL |
| Velocity windows (1min, 5min) | Flink event-time windows | Materialized views with WHERE txn_at >= NOW() - INTERVAL |
| Multi-table enrichment (joins) | Flink stream-stream or broadcast join | Standard SQL JOIN |
| Alert routing | Kafka Streams or Flink sink | Kafka sink with WHERE filter |
| Rule changes | Redeploy Flink job | ALTER or recreate materialized view |
| ML feature serving | Flink → feature store | Query materialized view directly |
| Operational complexity | High (Flink cluster, jobs, checkpoints) | Low (SQL DDL) |
| Out-of-order event handling | Flink watermarks | Append-only; late data handled by window bounds |
When to Keep Debezium + Flink for Fraud
Stick with Flink-based fraud pipelines when:
- Your fraud models are complex ML models that require Flink's ML integration or external feature store writes.
- You need precise event-time windowing with watermarks for out-of-order transaction events arriving from distributed payment processors.
- The team already runs Flink at scale and has Java expertise.
- You need millisecond-level latency targets (Flink's low-level API can beat RisingWave's materialized view refresh in extreme cases).
FAQ
Q: How accurate are RisingWave's time windows for velocity checks?
RisingWave uses processing time for NOW() comparisons. For most fraud use cases — where transactions are committed within seconds of occurring — this is sufficient. If transactions can arrive significantly delayed (minutes to hours after the actual event time), Flink's event-time windowing with watermarks provides more accurate window semantics.
Q: Can I integrate RisingWave fraud signals with an ML model?
Yes, in two ways. First, query the fraud_signals materialized view from a Python scoring service — it returns current feature values with millisecond latency. Second, use RisingWave's HTTP sink to push signals to an ML inference endpoint as they are generated.
Q: What happens to velocity counts if RisingWave restarts? RisingWave persists materialized view state to S3. On restart, it recovers from the last checkpoint and replays recent WAL events. Velocity windows recover fully — there is no cold-start data loss.
Q: Can RisingWave block a card in real time, not just alert?
RisingWave is a streaming analytics system, not a transaction interceptor. The correct pattern is: RisingWave publishes to fraud.alerts → alert service consumes and writes to a blocked_cards table → application code checks that table before authorizing transactions.
Q: How do I tune the fraud score thresholds?
Because the scoring logic lives in a SQL materialized view, threshold changes are a simple CREATE OR REPLACE MATERIALIZED VIEW operation. No Flink job redeployment needed. Changes take effect on the next incremental update cycle.

