Real-Time Risk Scoring for Agentic Transactions: Composite Signals with Streaming SQL

Real-Time Risk Scoring for Agentic Transactions: Composite Signals with Streaming SQL

·

22 min read

Introduction

An autonomous shopping agent fires twelve checkout requests in eighty seconds, all under the user's mandate cap. A booking agent stays well under its budget but suddenly tries to charge a crypto exchange. A travel agent operating for months without incident now sends a single charge from a country its user has never visited. Each one might be benign. Each one might be the first signal of a compromised mandate, a hijacked agent, or a prompt injection that flipped the agent's behavior.

No single rule catches all three. Velocity caps miss the crypto charge. Merchant blocklists miss the rapid-fire micro-purchases. Geographic checks miss the booking agent. The only way to make a defensible allow, review, or block decision is to combine multiple signals into one weighted risk score, recompute it on every transaction, and act on it before settlement.

This tutorial walks through building exactly that pipeline using streaming SQL in RisingWave, a streaming database that lets you express risk subscores and the composite as plain materialized views. Every query is verified against a running RisingWave instance (v2.8.0) and the actual output is included so you can compare your own results. By the end, you will have a six-signal risk model that produces a 0 to 100 composite score and an ALLOW / REVIEW / BLOCK action for every agentic transaction in real time.

Why Agentic Transactions Need a New Risk Model

Card-present and card-not-present fraud models built over the last twenty years assume a human is the principal. The human pulls out their wallet, taps to pay, and the system asks: does this match the human's pattern? AI agent transactions break this assumption in three concrete ways.

First, the principal of the transaction is the agent, not the human. A single user might delegate to a shopping agent, a travel agent, a developer-tools agent, and a finance agent in the same week. Each agent has its own velocity profile, merchant set, and amount distribution. Lumping them together into one user profile masks every interesting pattern. You need scoring at the agent level.

Second, the user issues a mandate, not a transaction. A mandate is a structured authorization: "this travel agent may spend up to $1500 on flights and hotels for the next 14 days." The mandate is the source of truth. A transaction that respects the mandate is legitimate by construction; one that violates it is a violation of explicit user intent, which is a far stronger fraud signal than any statistical anomaly. Risk models that ignore mandate scope miss the most reliable signal in the system.

Third, agents cluster their actions. A human shopper might buy two things in a day. A shopping agent comparison-shops across ten merchants in two minutes, then commits to one. Velocity caps tuned for humans flag every legitimate agent run. Caps tuned for agents miss humans behaving normally. The same threshold cannot serve both. You need agent-aware velocity, computed within a peer group of similar agents.

These three properties (agent-level identity, mandate as ground truth, and agent-clustered behavior) are why batch fraud pipelines built around human cardholders consistently miss agentic abuse. Building a real-time, multi-signal score on top of an operational data layer is the more durable approach.

Two RisingWave customers have already moved their risk scoring to streaming SQL for exactly these reasons. Atome, a buy-now-pay-later provider operating across Southeast Asia, uses RisingWave to compute composite credit and fraud features in real time as installment requests arrive. An anonymous broker we work with built their entire fraud feature store on RisingWave, joining velocity windows, peer benchmarks, and merchant risk tiers into a single feature vector that feeds their downstream model. Both started from the same observation: a model that scores an event five minutes after the event has already lost.

Building Block 1: Per-Signal Subscores

A composite risk score is only as useful as its inputs. Six signal families cover the bulk of agentic abuse patterns.

Velocity

How many transactions has the agent fired in the last 60 seconds, 5 minutes, and 1 hour? How does this compare to the agent's normal cadence? Velocity is the cheapest signal to compute and the easiest to abuse if used alone, because legitimate agents legitimately burst.

Behavioral drift

Compared to the agent's last 30 days, is the current transaction unusual in amount, merchant category, or time of day? Drift detection assumes you have a reasonable baseline, which you only get after the agent has run for a while. New agents need a different treatment, often a tighter peer comparison.

Mandate scope

Did the user authorize this exact merchant category? Is the amount within the per-charge cap? Has the cumulative spend on the mandate stayed under the period limit? Mandate signals are deterministic, not statistical, which makes them the most defensible input to the score.

Identity validation

Does the agent's claimed identity (DPoP key, signed mandate, attestation) check out? Has the user's account shown signs of takeover (recent password reset, new device)? Identity is a binary signal in most cases, but you can soften it into a graded subscore by counting how many of the identity checks failed.

Peer comparison

Among agents that serve similar users (same country, similar mandate scope, similar merchant set), is this transaction within the normal envelope? Peer comparison is what makes agentic risk scoring work for new agents that have no individual baseline yet.

Merchant risk

Tier the merchant catalog by historical chargeback rate, regulatory profile, and country of establishment. A tier-1 merchant like Amazon adds nothing; a tier-5 merchant in a jurisdiction with weak consumer protection adds significant risk. Country of the originating IP also matters, especially for charges from sanctioned jurisdictions.

For the rest of this article we focus on velocity, mandate, and merchant risk because they cover the most common attack surface and are the cheapest to demonstrate end-to-end. The same pattern extends naturally to drift, identity, and peer signals.

Building Block 2: Normalization to 0-100 Scale

Subscores have to be on the same scale before you can combine them. The convention used here, and in most production scoring systems, is 0 to 100 where 0 means "no concern" and 100 means "this signal alone is enough to act."

Three patterns cover most subscores.

Linear ramp from a threshold

Pick a baseline (below this is fine) and a saturation point (above this is maxed out), and linearly interpolate. Velocity is the canonical example. Two transactions per minute is fine for most agents; ten per minute is not. The score ramps from 0 at one transaction per minute to 100 at ten or more.

LEAST(100, GREATEST(0, (tx_count_60s - 1) * 18)) AS velocity_score

Discrete tier mapping

When the input is already categorical (merchant tier, mandate scope match), map each tier to a fixed score with a CASE statement. This avoids spurious precision and keeps the tier definitions auditable.

CASE merchant_risk_tier
    WHEN 1 THEN 0
    WHEN 2 THEN 25
    WHEN 3 THEN 50
    WHEN 4 THEN 75
    WHEN 5 THEN 100
END AS merchant_score

Ratio with cap

When a signal naturally produces a ratio (overage above mandate cap, deviation from baseline), normalize as a percentage and cap at 100. Mandate overage is the canonical case: 50% over the cap is bad, 200% over the cap is no worse for scoring purposes than 1000% over.

LEAST(100, ((amount - mandate_max_amount) / mandate_max_amount) * 100)

A few rules of thumb. Subscores should be monotonic in their input: more bad signal, higher score, never decreasing. They should be bounded; you do not want one runaway signal pushing the composite to 10000. And they should be re-computable from raw data; if you cannot replay your subscore from the raw transaction stream you cannot debug a false positive when the support ticket lands.

Building Block 3: Weighted Composite Score

Once subscores are normalized, the composite is a weighted sum. Weights reflect how much you trust each signal and how costly each error mode is.

A typical starting point for agentic transactions:

SignalWeight
Mandate scope0.45
Merchant risk0.30
Velocity0.25

Mandate gets the largest weight because it represents explicit user intent. A mandate violation is not a statistical anomaly; it is an act outside what the user authorized. Merchant risk is next because it captures the destination of the funds, which determines how recoverable a fraud loss will be. Velocity is real but easily faked by an attacker who paces their requests, so it gets less weight. Drift, identity, and peer comparison would slot in alongside these in a full implementation.

Weights should sum to 1.0 so the composite stays on the 0 to 100 scale. The composite is then:

ROUND(
    velocity_score * 0.25 +
    mandate_score  * 0.45 +
    merchant_score * 0.30
, 1) AS composite_score

Tuning weights is an empirical exercise. Run the pipeline on labeled historical data, sweep the weight grid, and pick the combination that maximizes recall on confirmed-fraud cases at a fixed false positive rate you can live with. Most teams settle within a few percentage points of these defaults.

Action Thresholds: Allow, Review, Block

The score by itself does nothing. The decision logic mapping score to action is what actually defends the system. Three bands cover the typical operational range.

  • 0 to 39: ALLOW. The transaction proceeds with no extra friction. This is the desired outcome for the overwhelming majority of legitimate agent traffic.
  • 40 to 69: REVIEW. The transaction is held briefly while a step-up check runs. This might mean a push notification to the user asking for confirmation, a synchronous identity re-check, or a manual queue for high-value cases.
  • 70 to 100: BLOCK. The transaction is rejected outright. The agent receives an explicit error, the user is notified, and the event is logged for follow-up.

The numeric bands are starting points. Calibrate them to your false-positive tolerance. A consumer-facing payments platform usually wants the BLOCK band tight because user friction is expensive; a B2B treasury tool can afford a wider BLOCK band because every transaction is high-value. As a general rule of thumb, fewer than 5% of transactions should land in REVIEW and fewer than 0.5% in BLOCK in a healthy system.

End-to-End Pipeline in RisingWave

We are now ready to assemble the pipeline. Connect to RisingWave at localhost:4566, database dev, user root, no password. The full pipeline below was verified on RisingWave v2.8.0 with the prefix aap12_ so you can drop everything cleanly when you are done.

Step 1: Define the transaction table

The base table holds the raw agentic transaction stream. In production this is fed by a Kafka source from your payments gateway; for the tutorial we use direct inserts.

CREATE TABLE aap12_agent_tx (
    tx_id VARCHAR PRIMARY KEY,
    agent_id VARCHAR NOT NULL,
    user_id VARCHAR NOT NULL,
    mandate_id VARCHAR NOT NULL,
    merchant VARCHAR NOT NULL,
    merchant_risk_tier INT NOT NULL,
    amount DECIMAL NOT NULL,
    ip_country VARCHAR,
    mandate_max_amount DECIMAL NOT NULL,
    mandate_merchant_scope VARCHAR,
    tx_time TIMESTAMPTZ NOT NULL
);

Step 2: Load representative transactions

The dataset covers six agents with deliberately different risk profiles. Agents A and E are clean; B fires a high-velocity burst; C violates its mandate (amount overage and crypto-merchant scope mismatch); D charges medium-risk merchants from a high-risk country; F sits at the borderline.

INSERT INTO aap12_agent_tx VALUES
-- Agent A: clean shopping pattern
('tx_001','agent_A','user_1','m_A1','amazon.com',1,49.50,'US',500.00,'retail','2026-05-06 10:00:00+00'),
('tx_002','agent_A','user_1','m_A1','target.com',1,32.00,'US',500.00,'retail','2026-05-06 10:15:00+00'),
('tx_003','agent_A','user_1','m_A1','walmart.com',1,71.25,'US',500.00,'retail','2026-05-06 10:42:00+00'),

-- Agent B: high velocity attack (10 transactions in <2 min)
('tx_010','agent_B','user_2','m_B1','shop1.io',2,120.00,'US',1000.00,'retail','2026-05-06 11:00:00+00'),
('tx_011','agent_B','user_2','m_B1','shop2.io',2,118.00,'US',1000.00,'retail','2026-05-06 11:00:15+00'),
('tx_012','agent_B','user_2','m_B1','shop3.io',2,131.00,'US',1000.00,'retail','2026-05-06 11:00:30+00'),
('tx_013','agent_B','user_2','m_B1','shop4.io',2, 99.00,'US',1000.00,'retail','2026-05-06 11:00:45+00'),
('tx_014','agent_B','user_2','m_B1','shop5.io',2,140.00,'US',1000.00,'retail','2026-05-06 11:01:00+00'),
('tx_015','agent_B','user_2','m_B1','shop6.io',2,109.00,'US',1000.00,'retail','2026-05-06 11:01:15+00'),
('tx_016','agent_B','user_2','m_B1','shop7.io',2,155.00,'US',1000.00,'retail','2026-05-06 11:01:30+00'),
('tx_017','agent_B','user_2','m_B1','shop8.io',2,128.00,'US',1000.00,'retail','2026-05-06 11:01:45+00'),
('tx_018','agent_B','user_2','m_B1','shop9.io',2,135.00,'US',1000.00,'retail','2026-05-06 11:02:00+00'),

-- Agent C: mandate violations (amount overage, crypto scope mismatch)
('tx_020','agent_C','user_3','m_C1','luxurycars.io',     3,8500.00,'US',1000.00,'retail','2026-05-06 12:00:00+00'),
('tx_021','agent_C','user_3','m_C1','crypto-exchange.io',5, 499.00,'US',1000.00,'retail','2026-05-06 12:05:00+00'),
('tx_022','agent_C','user_3','m_C1','crypto-exchange.io',5,2500.00,'RU',1000.00,'retail','2026-05-06 12:10:00+00'),

-- Agent D: medium-risk merchants from high-risk countries
('tx_030','agent_D','user_4','m_D1','offshore-bet.io',4,250.00,'MT',300.00,'gaming','2026-05-06 13:00:00+00'),
('tx_031','agent_D','user_4','m_D1','offshore-bet.io',4,310.00,'MT',300.00,'gaming','2026-05-06 13:30:00+00'),
('tx_032','agent_D','user_4','m_D1','vpn-store.io',   3, 89.00,'RU',300.00,'gaming','2026-05-06 13:45:00+00'),

-- Agent E: clean B2B agent
('tx_040','agent_E','user_5','m_E1','aws.amazon.com',1,1200.00,'US',5000.00,'cloud','2026-05-06 14:00:00+00'),
('tx_041','agent_E','user_5','m_E1','datadoghq.com', 1, 450.00,'US',5000.00,'cloud','2026-05-06 14:30:00+00'),
('tx_042','agent_E','user_5','m_E1','snowflake.com', 1, 890.00,'US',5000.00,'cloud','2026-05-06 15:00:00+00'),

-- Agent F: borderline, slight overage
('tx_050','agent_F','user_6','m_F1','etsy.com',2, 75.00,'US',200.00,'retail','2026-05-06 16:00:00+00'),
('tx_051','agent_F','user_6','m_F1','wish.com',2,210.00,'US',200.00,'retail','2026-05-06 16:10:00+00');

Step 3: Velocity subscore

Per-agent transaction count over a 60-second window before each event, ramped to a 0 to 100 score. Self-join on agent_id with a 60-second range gives the count; the ramp (tx_count - 1) * 18 saturates at five transactions per minute.

CREATE MATERIALIZED VIEW aap12_velocity_score_mv AS
WITH tx_window AS (
    SELECT
        a.tx_id,
        a.agent_id,
        a.tx_time,
        COUNT(b.tx_id) AS tx_count_60s,
        COALESCE(SUM(b.amount), 0) AS amount_60s
    FROM aap12_agent_tx a
    LEFT JOIN aap12_agent_tx b
        ON b.agent_id = a.agent_id
       AND b.tx_time >= a.tx_time - INTERVAL '60 seconds'
       AND b.tx_time <= a.tx_time
    GROUP BY a.tx_id, a.agent_id, a.tx_time
)
SELECT
    tx_id,
    agent_id,
    tx_count_60s,
    amount_60s,
    LEAST(100, GREATEST(0, (tx_count_60s - 1) * 18)) AS velocity_score
FROM tx_window;

Sample output for agent B's burst:

 tx_id  | agent_id | tx_count_60s | velocity_score
--------+----------+--------------+----------------
 tx_010 | agent_B  |            1 |              0
 tx_011 | agent_B  |            2 |             18
 tx_012 | agent_B  |            3 |             36
 tx_013 | agent_B  |            4 |             54
 tx_014 | agent_B  |            5 |             72
 tx_015 | agent_B  |            5 |             72
 tx_016 | agent_B  |            5 |             72
 tx_017 | agent_B  |            5 |             72
 tx_018 | agent_B  |            5 |             72

The score climbs as the burst ramps up and saturates at 72 once the rolling window holds five transactions. Note that 72 alone is not enough to block; velocity is a contributing signal, not a verdict.

Step 4: Mandate subscore

Two failure modes are folded into a single mandate score: the amount exceeds the per-charge cap, or the merchant falls outside the scope the user authorized. We take the maximum so any one violation is enough to push the score up.

CREATE MATERIALIZED VIEW aap12_mandate_score_mv AS
SELECT
    tx_id,
    agent_id,
    amount,
    mandate_max_amount,
    merchant,
    mandate_merchant_scope,
    CASE
        WHEN amount <= mandate_max_amount THEN 0
        ELSE LEAST(100, ((amount - mandate_max_amount) / mandate_max_amount) * 100)
    END AS overage_score,
    CASE
        WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%crypto%'      THEN 80
        WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%bet%'         THEN 70
        WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%vpn%'         THEN 60
        WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%luxurycars%'  THEN 40
        WHEN mandate_merchant_scope = 'gaming' AND merchant NOT ILIKE '%bet%'
                                                AND merchant NOT ILIKE '%casino%'
                                                AND merchant NOT ILIKE '%vpn%'    THEN 30
        ELSE 0
    END AS scope_score,
    GREATEST(
        CASE
            WHEN amount <= mandate_max_amount THEN 0
            ELSE LEAST(100, ((amount - mandate_max_amount) / mandate_max_amount) * 100)
        END,
        CASE
            WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%crypto%'      THEN 80
            WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%bet%'         THEN 70
            WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%vpn%'         THEN 60
            WHEN mandate_merchant_scope = 'retail' AND merchant ILIKE '%luxurycars%'  THEN 40
            WHEN mandate_merchant_scope = 'gaming' AND merchant NOT ILIKE '%bet%'
                                                    AND merchant NOT ILIKE '%casino%'
                                                    AND merchant NOT ILIKE '%vpn%'    THEN 30
            ELSE 0
        END
    ) AS mandate_score
FROM aap12_agent_tx;

Sample output for agent C, the mandate violator:

 tx_id  | agent_id | amount  | mandate_max_amount |      merchant      | mandate_merchant_scope | mandate_score
--------+----------+---------+--------------------+--------------------+------------------------+---------------
 tx_020 | agent_C  | 8500.00 |            1000.00 | luxurycars.io      | retail                 |           100
 tx_021 | agent_C  |  499.00 |            1000.00 | crypto-exchange.io | retail                 |            80
 tx_022 | agent_C  | 2500.00 |            1000.00 | crypto-exchange.io | retail                 |           100

tx_020 is 750% over the cap, which saturates the overage score at 100. tx_021 is under cap but charges a crypto exchange against a retail mandate, so the scope mismatch produces 80. tx_022 does both, so the mandate score is 100.

Step 5: Merchant risk subscore

Tier-based merchant scoring with a country adder for known high-risk jurisdictions. Tier 1 merchants contribute zero, tier 5 contribute 100, with intermediate tiers in 25-point increments. A 20-point bump applies for traffic from the listed countries.

CREATE MATERIALIZED VIEW aap12_merchant_risk_score_mv AS
SELECT
    tx_id,
    agent_id,
    merchant,
    merchant_risk_tier,
    ip_country,
    LEAST(100,
        CASE merchant_risk_tier
            WHEN 1 THEN 0
            WHEN 2 THEN 25
            WHEN 3 THEN 50
            WHEN 4 THEN 75
            WHEN 5 THEN 100
            ELSE 50
        END
        +
        CASE
            WHEN ip_country IN ('RU','MT','IR','KP') THEN 20
            ELSE 0
        END
    ) AS merchant_score
FROM aap12_agent_tx;

Sample output for agent D:

 tx_id  |    merchant     | merchant_risk_tier | ip_country | merchant_score
--------+-----------------+--------------------+------------+----------------
 tx_030 | offshore-bet.io |                  4 | MT         |             95
 tx_031 | offshore-bet.io |                  4 | MT         |             95
 tx_032 | vpn-store.io    |                  3 | RU         |             70

Tier-4 merchant (75 base) plus Malta (20) gives 95 for the betting site; tier-3 merchant (50) plus Russia (20) gives 70 for the VPN store.

Step 6: Composite risk score and action

Join the three subscores by tx_id, apply the weights, and emit the action band.

CREATE MATERIALIZED VIEW aap12_composite_risk_mv AS
SELECT
    t.tx_id,
    t.agent_id,
    t.amount,
    t.merchant,
    v.velocity_score,
    m.mandate_score,
    r.merchant_score,
    ROUND(
        (v.velocity_score * 0.25)
      + (m.mandate_score  * 0.45)
      + (r.merchant_score * 0.30)
    , 1) AS composite_score,
    CASE
        WHEN ((v.velocity_score * 0.25) + (m.mandate_score * 0.45) + (r.merchant_score * 0.30)) >= 70 THEN 'BLOCK'
        WHEN ((v.velocity_score * 0.25) + (m.mandate_score * 0.45) + (r.merchant_score * 0.30)) >= 40 THEN 'REVIEW'
        ELSE 'ALLOW'
    END AS action
FROM aap12_agent_tx t
JOIN aap12_velocity_score_mv      v ON v.tx_id = t.tx_id
JOIN aap12_mandate_score_mv       m ON m.tx_id = t.tx_id
JOIN aap12_merchant_risk_score_mv r ON r.tx_id = t.tx_id;

The full output across all 23 transactions:

 tx_id  | agent_id | velocity_score | mandate | merchant_score | composite_score | action
--------+----------+----------------+---------+----------------+-----------------+--------
 tx_001 | agent_A  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_002 | agent_A  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_003 | agent_A  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_010 | agent_B  |              0 |       0 |             25 |             7.5 | ALLOW
 tx_011 | agent_B  |             18 |       0 |             25 |            12.0 | ALLOW
 tx_012 | agent_B  |             36 |       0 |             25 |            16.5 | ALLOW
 tx_013 | agent_B  |             54 |       0 |             25 |            21.0 | ALLOW
 tx_014 | agent_B  |             72 |       0 |             25 |            25.5 | ALLOW
 tx_015 | agent_B  |             72 |       0 |             25 |            25.5 | ALLOW
 tx_016 | agent_B  |             72 |       0 |             25 |            25.5 | ALLOW
 tx_017 | agent_B  |             72 |       0 |             25 |            25.5 | ALLOW
 tx_018 | agent_B  |             72 |       0 |             25 |            25.5 | ALLOW
 tx_020 | agent_C  |              0 |     100 |             50 |            60.0 | REVIEW
 tx_021 | agent_C  |              0 |      80 |            100 |            66.0 | REVIEW
 tx_022 | agent_C  |              0 |     100 |            100 |            75.0 | BLOCK
 tx_030 | agent_D  |              0 |       0 |             95 |            28.5 | ALLOW
 tx_031 | agent_D  |              0 |     3.3 |             95 |            30.0 | ALLOW
 tx_032 | agent_D  |              0 |       0 |             70 |            21.0 | ALLOW
 tx_040 | agent_E  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_041 | agent_E  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_042 | agent_E  |              0 |       0 |              0 |             0.0 | ALLOW
 tx_050 | agent_F  |              0 |       0 |             25 |             7.5 | ALLOW
 tx_051 | agent_F  |     0          |     5.0 |             25 |             9.8 | ALLOW

Action distribution across the dataset:

 action | tx_count
--------+----------
 ALLOW  |       20
 BLOCK  |        1
 REVIEW |        2

The pipeline behaves as designed. Agent A and E pass cleanly. Agent B's velocity burst raises eyebrows but never crosses the REVIEW band because no other signal corroborates it. Agent C's three transactions show the full spectrum: a 750% mandate overage at a tier-3 merchant (60.0, REVIEW), a scope mismatch to a tier-5 merchant (66.0, REVIEW), and the worst case combining both with a high-risk country (75.0, BLOCK). Agent D rides medium merchant risk without tipping into REVIEW. Agent F's mild overage barely registers.

This is exactly the property you want from a composite score. No single signal can both block legitimate traffic on its own and block real attacks. The combination does both.

Tuning the Model From Production Data

A model that ships with default weights is the start, not the finish. The interesting work is the feedback loop that adjusts weights, thresholds, and subscore definitions as you learn what real abuse looks like in your traffic.

A few practical tuning patterns work well on top of a streaming SQL stack like RisingWave.

Replay against labeled outcomes. Maintain a separate table of confirmed-fraud and confirmed-good transaction IDs from your fraud ops team. Join it with the composite view to get a confusion matrix, then sweep weights in a notebook to find the Pareto frontier of precision and recall. Because RisingWave keeps the materialized views fresh, the joined view is always current; you do not need a separate batch job to recompute.

Per-agent threshold overrides. Some agents (regulated treasury agents, audited B2B integrations) deserve looser thresholds because their false positive cost is higher. Others (consumer agents handling small ticket sizes) can run tighter. Add a per-agent threshold table joined into the composite view, with a default fallback for agents not listed.

Rolling window benchmarking. Compute a rolling 7-day distribution of composite scores per agent class. When the median or P95 shifts more than a few points week over week, page the team. This is the cheapest early warning that someone has changed an agent's behavior or that an attacker has discovered a new pattern. The same machinery that powers real-time anomaly detection on transactional data fits naturally here.

Subscore-level monitoring. Track the contribution of each subscore to the composite over time. If one subscore is responsible for 90% of all REVIEW actions, that is a signal you have miscalibrated either the subscore or the weight. Often the right fix is to tighten a CASE expression at the subscore level rather than re-weight at the composite.

Dual-stream evaluation. Run a shadow scoring pipeline alongside production for new weight or threshold changes. Both pipelines write to materialized views; you compare aggregate decisions before promoting the new version. RisingWave's incremental computation means the shadow pipeline costs nothing once the views are populated. The pattern is the same one teams use for real-time A/B testing.

A common worry is that streaming SQL cannot host the model logic of a "real" risk system. In practice the inverse is true. Production risk teams find that 80% of what they need (subscore composition, weighted aggregation, action thresholds) is naturally a SQL pipeline, and the remaining 20% (gradient-boosted scoring, embedding-based anomaly detection) plugs in via a UDF or an external service called from the composite view. For background on the mandate primitives this model assumes, see the Agentic Commerce Protocol specification and the W3C Verifiable Credentials data model for the identity primitives that feed the identity subscore.

FAQ

How is risk scored for AI agent transactions?

Agentic transaction risk scoring combines multiple weighted signals into a single 0 to 100 composite score and maps the score to an action band. Typical inputs are velocity, behavioral drift, mandate scope compliance, identity validation, peer comparison, and merchant risk. Each signal is independently normalized to a 0 to 100 subscore, multiplied by a weight that reflects its importance, and summed into the composite. Action thresholds (commonly 0 to 39 ALLOW, 40 to 69 REVIEW, 70 to 100 BLOCK) translate the score into an enforcement decision.

What signals go into an agentic transaction risk score?

Six families cover the bulk of agentic abuse. Velocity captures how many transactions an agent fires in a rolling window. Behavioral drift compares the current transaction to the agent's historical baseline. Mandate scope checks amount, merchant category, and cumulative spend against the user-issued authorization. Identity validation confirms the agent and user credentials are consistent and that the user account is not in a takeover state. Peer comparison benchmarks the agent against similar agents serving similar users, which is essential for new agents with no individual baseline. Merchant risk reflects the trust tier and country of the destination merchant.

How are signals weighted in a composite score?

Weights should reflect the cost of being wrong on each signal. Mandate scope is usually weighted highest, around 0.40 to 0.50, because it represents explicit user intent being violated. Merchant risk is typically 0.25 to 0.35 because it determines how recoverable a fraud loss is. Velocity is 0.15 to 0.25; it is a real signal but easy to evade by pacing requests. Behavioral drift, identity, and peer comparison fill in the remainder. Weights should sum to 1.0 so the composite stays on the 0 to 100 scale. Tune them by replaying historical data against confirmed-fraud and confirmed-good labels and picking the combination that maximizes recall at a fixed false positive rate.

How does RisingWave compute risk scores in real time?

RisingWave is a streaming database that maintains materialized views incrementally as new events arrive. You define each subscore as its own materialized view in standard SQL (velocity, mandate, merchant, and so on), then define a composite view that joins them and applies weights. New transactions arriving on a Kafka topic or via insert trigger only the affected rows in each view to update, so the composite score is always current with sub-second latency. Action thresholds are expressed as CASE statements on the composite score, so the ALLOW / REVIEW / BLOCK decision is queryable as a regular column. The same view can feed an enforcement service via a Postgres-compatible read or be sunk into Kafka for downstream consumers.

Conclusion

A composite risk score is the only defense that scales to the variety of agentic transaction patterns you will see in production. Velocity catches the burst. Mandate catches the violation. Merchant catches the destination. Each signal alone is wrong too often; combined and weighted, they produce a defensible decision in real time.

Streaming SQL on RisingWave makes the implementation tractable. Each subscore is a materialized view that updates as new transactions arrive. The composite is one more view that joins them and applies weights. Action thresholds are CASE expressions. There is no Java service to deploy, no separate state store to manage, and no batch job to wait on. The same engine that ingests the transaction stream serves the scoring queries.

The pipeline shown here is a starting point. In production you would extend it with behavioral drift, identity, and peer comparison subscores, push the weights into a configuration table for runtime tuning, and wire the composite view to your enforcement service. The shape of the code stays the same.

Ready to build composite risk scores in real time? Try RisingWave Cloud free.

Join our Slack community to compare notes with other teams shipping agentic-payment risk pipelines on streaming SQL.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.