Credit Risk Scoring with Real-Time SQL Features

Credit models are only as good as the features feeding them. And the uncomfortable truth in most lending operations is that those features are hours old by the time they influence a decision.

A borrower who opened three payday loans yesterday afternoon looks nearly identical to yesterday's batch features. A customer who suddenly doubled their monthly spend in the past 72 hours registers no differently than someone who has held a steady pattern for six months. Your model is making decisions on a snapshot of the world that has already changed.

This article walks through how to build a real-time feature pipeline for credit risk using streaming SQL, specifically with RisingWave. The target audience is engineers at fintech companies who are already running a credit decisioning system and are tired of maintaining Python-based feature computation jobs, Spark clusters, or hourly Airflow DAGs that can't keep pace with borrower behavior.

The Credit Risk Feature Problem

Credit bureaus have trained the industry to think in terms of monthly refresh cycles. VantageScore and FICO pull tradeline data that might be 30 to 45 days stale before it influences a score. For indirect lending or buy-now-pay-later products, that lag is acceptable. For direct-to-consumer credit lines with real-time decisioning, it is not.

The features that actually predict near-term default are the ones that change fastest:

Credit utilization -- A borrower at 45% utilization yesterday who jumps to 92% today is sending a signal. Monthly batch pipelines will not catch this until next cycle. If you are approving a new credit line today, you are doing so without knowing about that shift.

Payment velocity -- It is not just whether payments are on time; it is the pattern of behavior around due dates. Late-but-consistent is very different from erratic. A borrower who started missing payments in the past two weeks is a different risk profile than one who has been late for six months.

Inquiry recency -- Hard inquiries are a well-known derogatory signal, but their timing is everything. Three inquiries spread over six months is different from three inquiries in 72 hours. Batch systems that aggregate inquiry counts over a fixed window miss the velocity entirely.

Spending pattern deviation -- A sudden change in merchant category distribution or transaction size is a behavioral signal. A borrower who was spending primarily on groceries and utilities but shifted to cash advances and high-interest lenders this week is worth flagging.

None of these require exotic ML. They are straightforward aggregate features. What they require is data that is seconds or minutes old, not hours.

Data Sources

A real-time credit feature pipeline needs three primary event streams:

Transaction events -- Card and ACH transactions, including amount, merchant category code, timestamp, and account identifier. These arrive continuously and need to feed spending velocity and pattern features.

Payment events -- On-time payments, late payments, missed payments, and partial payments against existing credit accounts. These drive payment behavior features.

Credit inquiry events -- Hard and soft pull requests, with timestamp and requesting lender type. These drive inquiry velocity features.

In most production environments these land in Kafka topics. RisingWave can read directly from Kafka, which means you define your source once and let streaming SQL maintain the derived feature tables continuously.

-- Transaction source
CREATE SOURCE transactions (
    account_id     VARCHAR,
    txn_id         VARCHAR,
    amount         DECIMAL(12, 2),
    merchant_category_code VARCHAR,
    txn_time       TIMESTAMPTZ,
    txn_type       VARCHAR  -- 'purchase', 'cash_advance', 'balance_transfer'
)
WITH (
    connector = 'kafka',
    topic = 'credit.transactions',
    properties.bootstrap.server = 'broker:9092'
)
FORMAT PLAIN ENCODE JSON;

-- Payment event source
CREATE SOURCE payment_events (
    account_id     VARCHAR,
    payment_id     VARCHAR,
    amount         DECIMAL(12, 2),
    due_amount     DECIMAL(12, 2),
    payment_date   TIMESTAMPTZ,
    payment_time   TIMESTAMPTZ,
    status         VARCHAR,  -- 'on_time', 'late', 'missed', 'partial'
    days_late      INTEGER
)
WITH (
    connector = 'kafka',
    topic = 'credit.payments',
    properties.bootstrap.server = 'broker:9092'
)
FORMAT PLAIN ENCODE JSON;

-- Credit inquiry source
CREATE SOURCE credit_inquiries (
    account_id     VARCHAR,
    inquiry_id     VARCHAR,
    inquiry_time   TIMESTAMPTZ,
    inquiry_type   VARCHAR,  -- 'hard', 'soft'
    lender_type    VARCHAR   -- 'bank', 'payday', 'bnpl', 'auto', 'mortgage'
)
WITH (
    connector = 'kafka',
    topic = 'credit.inquiries',
    properties.bootstrap.server = 'broker:9092'
)
FORMAT PLAIN ENCODE JSON;

Computing Credit Features with Streaming SQL

Each materialized view below is maintained incrementally by RisingWave. When a new event arrives, only the affected rows are recomputed -- not a full table scan. This is what makes sub-second feature freshness practical without overprovisioning compute.

Payment Behavior Features

Payment history is the single most predictive feature cluster in most credit models. The 12-month window is standard, but you can tune it based on your portfolio characteristics.

CREATE MATERIALIZED VIEW payment_behavior_features AS
SELECT
    account_id,
    COUNT(*)                                             AS total_payments_12m,
    COUNT(*) FILTER (WHERE status = 'on_time')           AS on_time_payments_12m,
    COUNT(*) FILTER (WHERE status = 'late')              AS late_payments_12m,
    COUNT(*) FILTER (WHERE status = 'missed')            AS missed_payments_12m,
    COUNT(*) FILTER (WHERE status = 'partial')           AS partial_payments_12m,
    AVG(days_late) FILTER (WHERE days_late > 0)          AS avg_days_late_12m,
    MAX(days_late)                                       AS max_days_late_12m,
    MAX(payment_time)                                    AS last_payment_at,
    EXTRACT(DAY FROM (NOW() - MAX(payment_time)))        AS days_since_last_payment,
    -- Payment ratio: lower is worse
    ROUND(
        COUNT(*) FILTER (WHERE status = 'on_time')::DECIMAL /
        NULLIF(COUNT(*), 0),
        4
    )                                                    AS on_time_payment_ratio,
    -- Missed payments in last 90 days specifically
    COUNT(*) FILTER (
        WHERE status = 'missed'
        AND payment_date >= NOW() - INTERVAL '90 days'
    )                                                    AS missed_payments_90d
FROM payment_events
WHERE payment_date >= NOW() - INTERVAL '12 months'
GROUP BY account_id;

Spending Velocity Features

Velocity features compare behavior across multiple time windows. This is where you catch the borrower who is fine over 30 days but has dramatically changed behavior in the last week.

CREATE MATERIALIZED VIEW spending_velocity AS
SELECT
    account_id,
    -- 7-day window
    SUM(amount) FILTER (
        WHERE txn_time >= NOW() - INTERVAL '7 days'
    )                                                    AS spend_7d,
    COUNT(*) FILTER (
        WHERE txn_time >= NOW() - INTERVAL '7 days'
    )                                                    AS txn_count_7d,

    -- 30-day window
    SUM(amount) FILTER (
        WHERE txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS spend_30d,
    COUNT(*) FILTER (
        WHERE txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS txn_count_30d,

    -- Category diversity: how many distinct MCCs in last 30 days
    COUNT(DISTINCT merchant_category_code) FILTER (
        WHERE txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS category_diversity_30d,

    -- Large transaction signal
    COUNT(*) FILTER (
        WHERE amount > 500
        AND txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS large_txn_count_30d,

    -- Cash advance signal (high-risk behavior indicator)
    COUNT(*) FILTER (
        WHERE txn_type = 'cash_advance'
        AND txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS cash_advance_count_30d,

    SUM(amount) FILTER (
        WHERE txn_type = 'cash_advance'
        AND txn_time >= NOW() - INTERVAL '30 days'
    )                                                    AS cash_advance_amount_30d
FROM transactions
WHERE txn_time >= NOW() - INTERVAL '30 days'
GROUP BY account_id;

Inquiry Velocity Features

Hard inquiries are a known risk signal, but most batch systems only report total counts. Velocity within a short window matters more.

CREATE MATERIALIZED VIEW inquiry_features AS
SELECT
    account_id,
    -- Total hard pulls across windows
    COUNT(*) FILTER (
        WHERE inquiry_type = 'hard'
        AND inquiry_time >= NOW() - INTERVAL '7 days'
    )                                                    AS hard_inquiries_7d,

    COUNT(*) FILTER (
        WHERE inquiry_type = 'hard'
        AND inquiry_time >= NOW() - INTERVAL '30 days'
    )                                                    AS hard_inquiries_30d,

    COUNT(*) FILTER (
        WHERE inquiry_type = 'hard'
        AND inquiry_time >= NOW() - INTERVAL '90 days'
    )                                                    AS hard_inquiries_90d,

    -- Payday lender inquiries specifically (high-risk signal)
    COUNT(*) FILTER (
        WHERE inquiry_type = 'hard'
        AND lender_type = 'payday'
        AND inquiry_time >= NOW() - INTERVAL '30 days'
    )                                                    AS payday_inquiries_30d,

    -- Days since most recent hard inquiry
    EXTRACT(DAY FROM (NOW() - MAX(inquiry_time) FILTER (
        WHERE inquiry_type = 'hard'
    )))                                                  AS days_since_last_hard_inquiry,

    MAX(inquiry_time) FILTER (
        WHERE inquiry_type = 'hard'
    )                                                    AS last_hard_inquiry_at
FROM credit_inquiries
WHERE inquiry_time >= NOW() - INTERVAL '12 months'
GROUP BY account_id;

Unified Feature View

In practice, the decisioning engine wants a single row per account with all features merged. A view joining the materialized views accomplishes this without duplicating storage.

CREATE VIEW credit_features AS
SELECT
    COALESCE(p.account_id, s.account_id, i.account_id) AS account_id,
    -- Payment features
    p.on_time_payments_12m,
    p.late_payments_12m,
    p.missed_payments_12m,
    p.missed_payments_90d,
    p.avg_days_late_12m,
    p.on_time_payment_ratio,
    p.days_since_last_payment,
    -- Spending velocity
    s.spend_7d,
    s.spend_30d,
    s.txn_count_7d,
    s.txn_count_30d,
    s.category_diversity_30d,
    s.large_txn_count_30d,
    s.cash_advance_count_30d,
    s.cash_advance_amount_30d,
    -- Inquiry features
    i.hard_inquiries_7d,
    i.hard_inquiries_30d,
    i.hard_inquiries_90d,
    i.payday_inquiries_30d,
    i.days_since_last_hard_inquiry,
    -- Feature freshness timestamp
    NOW() AS features_as_of
FROM payment_behavior_features p
FULL OUTER JOIN spending_velocity s    ON p.account_id = s.account_id
FULL OUTER JOIN inquiry_features i     ON COALESCE(p.account_id, s.account_id) = i.account_id;

Serving Features to the Decisioning Engine

RisingWave exposes a PostgreSQL-compatible wire protocol, which means your decisioning engine can query it directly with any Postgres client. A point lookup for a single account is fast -- sub-10ms in most deployments -- because the materialized views are pre-computed and indexed.

import psycopg2

conn = psycopg2.connect(
    host="risingwave-host",
    port=4566,
    dbname="dev",
    user="root"
)

def get_credit_features(account_id: str) -> dict:
    with conn.cursor() as cur:
        cur.execute(
            "SELECT * FROM credit_features WHERE account_id = %s",
            (account_id,)
        )
        row = cur.fetchone()
        if row is None:
            return {}
        columns = [desc[0] for desc in cur.description]
        return dict(zip(columns, row))

For high-throughput decisioning (thousands of applications per second), you can also push features downstream to Redis or a low-latency key-value store using RisingWave's sink connectors, and have the decisioning engine read from there.

-- Sink features to Redis for ultra-low latency reads
CREATE SINK credit_features_redis
FROM credit_features
WITH (
    connector = 'redis',
    redis.url = 'redis://cache:6379',
    primary_key = 'account_id'
);

Handling Edge Cases

Thin-File Users

New-to-credit borrowers will have no rows in the payment or inquiry materialized views. Your decisioning logic needs to distinguish between "absent because new" and "absent because something is wrong with the pipeline."

A practical pattern is to JOIN against a customer account table and use COALESCE with explicit defaults:

CREATE VIEW credit_features_with_defaults AS
SELECT
    a.account_id,
    COALESCE(p.on_time_payment_ratio, -1)      AS on_time_payment_ratio,
    COALESCE(p.missed_payments_12m, -1)         AS missed_payments_12m,
    COALESCE(s.spend_30d, 0)                    AS spend_30d,
    COALESCE(i.hard_inquiries_30d, 0)           AS hard_inquiries_30d,
    -- Flag that this is a thin-file account
    CASE
        WHEN p.account_id IS NULL THEN TRUE
        ELSE FALSE
    END                                          AS is_thin_file,
    EXTRACT(DAY FROM (NOW() - a.account_open_date)) AS account_age_days
FROM accounts a
LEFT JOIN payment_behavior_features p  ON a.account_id = p.account_id
LEFT JOIN spending_velocity s          ON a.account_id = s.account_id
LEFT JOIN inquiry_features i           ON a.account_id = i.account_id;

Pass is_thin_file = TRUE to a separate model segment or to a thin-file underwriting policy rather than forcing a standard score.

New Accounts with No Transaction History

For accounts under 30 days old, 30-day velocity features are meaningless -- the window is longer than the account lifetime. Use account_age_days to conditionally suppress or weight these features in the model input vector.

Missing Data Handling

Streaming systems occasionally produce out-of-order or duplicate events. For credit features, duplicates are the bigger concern -- a duplicate payment event will inflate on-time counts. Ensure your Kafka producers use idempotent delivery and that your source schema includes a natural deduplication key (payment_id, txn_id) that you can reference in downstream transformations.

Compliance Considerations

Real-time features introduce audit complexity that batch pipelines handle more naturally. When a regulator asks "what features drove this denial," a batch pipeline can replay the exact feature table snapshot used at decision time. With streaming features, you need to be more deliberate.

Feature logging: At decision time, log the full feature vector alongside the decision and a timestamp. This is the ground truth for adverse action notices and model audits. Do not rely on re-querying the live feature store after the fact -- the values will have changed.

Explainability: Streaming features derived from SQL are inherently more interpretable than features engineered in Python. The SQL definition is the feature definition -- there is no code to reverse-engineer. This is a genuine advantage for model documentation requirements under ECOA and the Equal Credit Opportunity Act.

Feature drift monitoring: Because features update continuously, drift can be harder to detect than in a batch system where you can compare yesterday's distribution to last week's. Set up monitoring on the distribution of key features (utilization, payment ratio) and alert when distributions shift outside expected bounds. RisingWave can feed these distribution metrics to a monitoring system in real time.

Model governance with streaming features: One underappreciated challenge is that streaming features make it harder to do offline model validation. When you train on historical data, you need to reconstruct what the feature values were at each training label's decision time -- not the current values. This requires either logging features at training time or running a replay of your event streams through the same SQL logic. Both approaches are viable but add infrastructure overhead that teams should budget for.

FAQ

How fresh are the features in practice? End-to-end latency from event to updated materialized view is typically under one second for well-tuned RisingWave deployments. For most credit decisioning use cases, this is more than sufficient. The bottleneck is usually event publication latency (time from transaction processing to Kafka), not the streaming SQL computation.

Can I use this alongside an existing batch pipeline? Yes. A common migration pattern is to run the streaming pipeline in shadow mode alongside the batch pipeline, compare feature values, and gradually shift weight to the streaming features as confidence builds.

What happens if RisingWave goes down? RisingWave checkpoints state to object storage (S3 or compatible). On restart, it replays from the last checkpoint. Feature values will be slightly stale during recovery but will catch up quickly. For decisioning systems, the standard fallback is to fall back to cached feature values or to more conservative credit policies during any system unavailability.

How does this compare to a dedicated feature store like Feast or Tecton? Feast and Tecton add a feature registry, versioning, and training/serving skew detection on top of the compute layer. RisingWave handles the compute and low-latency serving but does not provide a feature registry out of the box. For teams that need a registry (especially for larger model inventories), combining RisingWave with a lightweight registry is reasonable. For teams with a small number of credit models and SQL-native feature definitions, the added complexity of a full feature platform is often not worth it.

Are window-based features affected by event time vs. processing time differences? Yes -- this is a real concern. The SQL examples above use NOW() as the window boundary, which is processing time. If your event source has significant latency (events arriving minutes or hours late), you should use event-time watermarks and time-based windows anchored to the event timestamp rather than processing time. RisingWave supports event-time windowing via its temporal filter and watermark capabilities.

Credit risk feature engineering does not need to be a Python-and-Spark problem. The features that matter most -- payment behavior, spending velocity, inquiry recency -- are aggregate SQL operations over time windows. Streaming SQL makes those operations continuous rather than batch, and RisingWave makes them easy to operate without a dedicated platform team.

The honest caveat: streaming features add complexity in model governance and offline training pipelines. That cost is real. But for lending products where decision speed and feature freshness directly affect portfolio quality, it is a cost worth paying.