Real-Time Observability for Agentic Payment Flows

Real-Time Observability for Agentic Payment Flows

AI agents are starting to spend money. Whether it is a shopping agent buying groceries, a travel agent booking a flight, or a finance agent paying a vendor, the payment is no longer a single click by a human. It is a multi-stage handshake between the agent, the user's mandate, the payment network, and the merchant. Every stage can fail. Every stage can slow down. And when something goes wrong at 3am, you need to know which stage broke and why, before users start complaining.

Traditional payment observability stacks were not built for this. They assume one transaction with one approval. Agentic payment flows look more like distributed traces: agent intent → mandate issuance → mandate validation → payment authorization → settlement → outcome. Each hop has its own latency budget, its own error surface, and its own SLO. If you only watch the final outcome, you miss the early-stage rot that eventually causes outages.

This article shows how to build a real-time observability layer for agentic payment flows using streaming SQL on RisingWave. You will learn how to model the stage events, build end-to-end tracing with materialized views, track p50/p95/p99 latency per stage, compute funnel drop-off, and feed a live dashboard that any BI tool can query. Every SQL example has been verified against RisingWave 2.8.0.

The Stages of an Agentic Payment Flow

Before you can observe an agentic payment, you have to agree on what its anatomy looks like. While exact details vary across protocols (such as the AP2 mandate framework, OpenAI's agent commerce protocol, or proprietary issuer flows), the canonical stages are remarkably consistent:

  1. Intent: the agent decides to initiate a payment on behalf of a user. This is where policy engines, spending limits, and merchant allowlists are checked.
  2. Mandate issuance: the user-signed mandate (or a delegation token) is generated. This is the artifact that proves the agent is authorized.
  3. Mandate validation: the payment network or a verification service checks the mandate's signature, expiry, scope, and freshness.
  4. Payment authorization: the issuer or processor reserves the funds. This is the equivalent of a card auth in the legacy world.
  5. Settlement: the money actually moves from the buyer to the merchant, either instantly (RTP, FedNow, UPI) or in a batch (ACH, card capture).
  6. Outcome: a final state is returned: settled, failed, or refunded.

A flow that completes cleanly fires one event per stage. A flow that fails fires events up to the failure point, then a failed event. A flow that gets stuck fires events up to the last successful stage, then nothing. And "nothing" is itself a signal.

What "Observability" Means for Agentic Payments

Observability for agentic payments is not about counting transactions. It is about answering questions like:

  • Where in the pipeline are flows failing? Is the validator rejecting more mandates than usual, or is the issuer denying authorizations?
  • Where in the pipeline are flows slowing down? Did mandate validation just jump from 90ms to 300ms p95?
  • Which agents and which users are affected? A spike concentrated on one agent_id is a different problem from a spike across the whole platform.
  • Are flows getting stuck? A flow that reached mandate_issued but never advanced is invisible to outcome-based metrics, but very visible to a stage-aware funnel.
  • What is the drop-off between stages? If 13 flows enter intent but only 8 reach settled, where did the other 5 go?

Answering all of this in real time means you cannot rely on nightly batch jobs or hourly aggregates. You need a streaming layer that ingests stage events as they happen, maintains derived state continuously, and exposes the result through a query interface a dashboard can poll. That is what we are going to build.

Capturing the Stages as Events

The first step is to emit a structured event at every stage transition. Whatever your payment platform looks like internally, you want every service that touches a flow to publish an event with a consistent schema.

CREATE TABLE aap16_flow_events (
  event_id   VARCHAR PRIMARY KEY,
  flow_id    VARCHAR,
  agent_id   VARCHAR,
  user_id    VARCHAR,
  stage      VARCHAR,         -- 'intent' | 'mandate_issued' | 'mandate_validated'
                              -- | 'payment_authorized' | 'settled' | 'failed'
  success    BOOLEAN,
  latency_ms INT,             -- time spent inside this stage
  error_code VARCHAR,
  event_time TIMESTAMPTZ
);

A few notes on the schema choices:

  • flow_id is the trace correlation key. Every event for the same payment flow shares it. This is the single most important field; without it, you cannot reconstruct a flow.
  • stage is a string enum. Strings are easier to extend than integers when you add new stages later (for example, a risk_review stage between mandate_validated and payment_authorized).
  • latency_ms is the time inside this stage, not cumulative. Cumulative latency is computed downstream by summing.
  • success BOOLEAN plus error_code VARCHAR lets you separate the outcome of a stage from the reason it failed. A mandate_validated event with success = false and error_code = 'SIG_MISMATCH' is much more useful than a generic failed event.

Now feed in some events. The data below covers 13 flows: most complete cleanly, some fail at different stages, two are stuck mid-pipeline, and latencies are varied so the percentile math has something to chew on.

INSERT INTO aap16_flow_events VALUES
-- Flow 1: clean success
('e001','f-001','agt-A','u-100','intent',             TRUE,  45, NULL,                '2026-05-06 09:00:00+00'),
('e002','f-001','agt-A','u-100','mandate_issued',     TRUE, 120, NULL,                '2026-05-06 09:00:00+00'),
('e003','f-001','agt-A','u-100','mandate_validated',  TRUE,  85, NULL,                '2026-05-06 09:00:01+00'),
('e004','f-001','agt-A','u-100','payment_authorized', TRUE, 240, NULL,                '2026-05-06 09:00:01+00'),
('e005','f-001','agt-A','u-100','settled',            TRUE, 310, NULL,                '2026-05-06 09:00:02+00'),

-- Flow 3: fails at mandate_validated
('e011','f-003','agt-A','u-102','intent',             TRUE,  38, NULL,                '2026-05-06 09:02:00+00'),
('e012','f-003','agt-A','u-102','mandate_issued',     TRUE, 140, NULL,                '2026-05-06 09:02:00+00'),
('e013','f-003','agt-A','u-102','mandate_validated',  FALSE, 92, 'SIG_MISMATCH',      '2026-05-06 09:02:01+00'),
('e014','f-003','agt-A','u-102','failed',             FALSE,  0, 'SIG_MISMATCH',      '2026-05-06 09:02:01+00'),

-- Flow 4: insufficient funds at authorization
('e015','f-004','agt-C','u-103','intent',             TRUE,  55, NULL,                '2026-05-06 09:03:00+00'),
('e016','f-004','agt-C','u-103','mandate_issued',     TRUE, 130, NULL,                '2026-05-06 09:03:00+00'),
('e017','f-004','agt-C','u-103','mandate_validated',  TRUE,  95, NULL,                '2026-05-06 09:03:01+00'),
('e018','f-004','agt-C','u-103','payment_authorized', FALSE,510, 'INSUFFICIENT_FUNDS','2026-05-06 09:03:01+00'),
('e019','f-004','agt-C','u-103','failed',             FALSE,  0, 'INSUFFICIENT_FUNDS','2026-05-06 09:03:01+00'),

-- Flow 6: stuck after mandate_issued (no further events)
('e025','f-006','agt-B','u-105','intent',             TRUE,  48, NULL,                '2026-05-06 09:05:00+00'),
('e026','f-006','agt-B','u-105','mandate_issued',     TRUE, 155, NULL,                '2026-05-06 09:05:00+00'),

-- Flow 8: blocked at intent by policy
('e032','f-008','agt-D','u-107','intent',             FALSE, 60, 'POLICY_BLOCK',      '2026-05-06 09:07:00+00'),
('e033','f-008','agt-D','u-107','failed',             FALSE,  0, 'POLICY_BLOCK',      '2026-05-06 09:07:00+00'),

-- Flow 10: settlement timeout
('e039','f-010','agt-B','u-109','intent',             TRUE,  50, NULL,                '2026-05-06 09:09:00+00'),
('e040','f-010','agt-B','u-109','mandate_issued',     TRUE, 138, NULL,                '2026-05-06 09:09:00+00'),
('e041','f-010','agt-B','u-109','mandate_validated',  TRUE,  91, NULL,                '2026-05-06 09:09:01+00'),
('e042','f-010','agt-B','u-109','payment_authorized', TRUE, 305, NULL,                '2026-05-06 09:09:01+00'),
('e043','f-010','agt-B','u-109','settled',            FALSE,880, 'SETTLEMENT_TIMEOUT','2026-05-06 09:09:03+00'),
('e044','f-010','agt-B','u-109','failed',             FALSE,  0, 'SETTLEMENT_TIMEOUT','2026-05-06 09:09:03+00');
-- (... plus seven more clean and one stuck flow, 55 events total across 13 flows)

In production, these events come from your payment services through Kafka, Pulsar, or CDC streams. RisingWave ingests them with a CREATE SOURCE statement; the rest of the pipeline is identical to what we are about to build on this in-memory table. For a deeper look at moving CDC traffic into a streaming SQL engine, see CDC vs dual writes.

End-to-End Tracing with Streaming SQL

The first materialized view we want is a per-flow status row: where is each flow right now, what is its cumulative latency, has it failed, has it settled?

CREATE MATERIALIZED VIEW aap16_flow_status_mv AS
SELECT
  flow_id,
  MAX(agent_id) AS agent_id,
  MAX(user_id)  AS user_id,
  COUNT(*)      AS event_count,
  SUM(latency_ms) AS cumulative_latency_ms,
  MAX(CASE stage
        WHEN 'failed'             THEN 99
        WHEN 'settled'             THEN 5
        WHEN 'payment_authorized'  THEN 4
        WHEN 'mandate_validated'   THEN 3
        WHEN 'mandate_issued'      THEN 2
        WHEN 'intent'              THEN 1
        ELSE 0 END) AS max_stage_rank,
  BOOL_OR(stage = 'failed') AS has_failed,
  BOOL_OR(stage = 'settled' AND success) AS has_settled_ok,
  MAX(event_time) AS last_event_time
FROM aap16_flow_events
GROUP BY flow_id;

The trick is the CASE expression that assigns a numeric rank to each stage. failed gets rank 99 so the view also captures terminal failure as a sentinel. BOOL_OR lets us cheaply summarize whether a flow has hit a terminal state. Querying it gives you a complete trace-level snapshot of every flow:

 flow_id | agent_id | user_id | event_count | cumulative_latency_ms | max_stage_rank | has_failed | has_settled_ok
---------+----------+---------+-------------+-----------------------+----------------+------------+----------------
 f-001   | agt-A    | u-100   |           5 |                   800 |              5 | f          | t
 f-002   | agt-B    | u-101   |           5 |                  1137 |              5 | f          | t
 f-003   | agt-A    | u-102   |           4 |                   270 |             99 | t          | f
 f-004   | agt-C    | u-103   |           5 |                   790 |             99 | t          | f
 f-005   | agt-A    | u-104   |           5 |                   857 |              5 | f          | t
 f-006   | agt-B    | u-105   |           2 |                   203 |              2 | f          | f
 f-007   | agt-C    | u-106   |           5 |                   721 |              5 | f          | t
 f-008   | agt-D    | u-107   |           2 |                    60 |             99 | t          | f
 f-009   | agt-A    | u-108   |           5 |                   844 |              5 | f          | t
 f-010   | agt-B    | u-109   |           6 |                  1464 |             99 | t          | f
 f-011   | agt-C    | u-110   |           5 |                   817 |              5 | f          | t
 f-012   | agt-D    | u-111   |           1 |                    55 |              1 | f          | f
 f-013   | agt-A    | u-112   |           5 |                  1127 |              5 | f          | t

Read that table top to bottom and you can see exactly what kind of platform you have. Flow f-006 reached mandate_issued (rank 2) but never advanced, it is stuck. Flow f-012 is stuck even earlier, at intent. Flow f-010 has the highest cumulative latency (1464ms) and ended in failure, which is exactly the kind of slow-failure case worth alerting on.

To drill into a specific flow, ask the raw event table directly:

SELECT flow_id, stage, success, latency_ms,
       COALESCE(error_code,'-') AS error_code, event_time
FROM aap16_flow_events
WHERE flow_id = 'f-010'
ORDER BY event_time, stage;
 flow_id |       stage        | success | latency_ms |     error_code     |        event_time
---------+--------------------+---------+------------+--------------------+---------------------------
 f-010   | intent             | t       |         50 | -                  | 2026-05-06 09:09:00+00:00
 f-010   | mandate_issued     | t       |        138 | -                  | 2026-05-06 09:09:00+00:00
 f-010   | mandate_validated  | t       |         91 | -                  | 2026-05-06 09:09:01+00:00
 f-010   | payment_authorized | t       |        305 | -                  | 2026-05-06 09:09:01+00:00
 f-010   | failed             | f       |          0 | SETTLEMENT_TIMEOUT | 2026-05-06 09:09:03+00:00
 f-010   | settled            | f       |        880 | SETTLEMENT_TIMEOUT | 2026-05-06 09:09:03+00:00

This is your end-to-end trace. The settlement step took 880ms (way above the typical ~330ms) and ended in SETTLEMENT_TIMEOUT. With this view in place, on-call engineers can paste any flow_id from a customer ticket and see the full lifecycle in a single query.

You can also surface the "stuck" set as its own filter:

SELECT flow_id, agent_id, max_stage_rank, cumulative_latency_ms, last_event_time
FROM aap16_flow_status_mv
WHERE NOT has_settled_ok AND NOT has_failed
ORDER BY flow_id;
 flow_id | agent_id | max_stage_rank | cumulative_latency_ms |      last_event_time
---------+----------+----------------+-----------------------+---------------------------
 f-006   | agt-B    |              2 |                   203 | 2026-05-06 09:05:00+00:00
 f-012   | agt-D    |              1 |                    55 | 2026-05-06 09:11:00+00:00

Two flows are sitting in limbo. In production you would join this against now() to expire flows that have been idle for too long.

Stage Latency Tracking (p50 / p95 / p99 per stage)

Latency at the median is comfortable lying. The tail is where customer pain lives. RisingWave's approx_percentile aggregate computes streaming percentiles incrementally, which means a single materialized view can deliver p50/p95/p99 per stage that updates as each event arrives.

CREATE MATERIALIZED VIEW aap16_stage_latency_mv AS
SELECT
  stage,
  COUNT(*)                          AS sample_count,
  AVG(latency_ms)::INT              AS avg_latency_ms,
  approx_percentile(0.50) WITHIN GROUP (ORDER BY latency_ms)::INT AS p50_latency_ms,
  approx_percentile(0.95) WITHIN GROUP (ORDER BY latency_ms)::INT AS p95_latency_ms,
  approx_percentile(0.99) WITHIN GROUP (ORDER BY latency_ms)::INT AS p99_latency_ms
FROM aap16_flow_events
WHERE stage IN ('intent','mandate_issued','mandate_validated','payment_authorized','settled')
GROUP BY stage;

We exclude the failed events because their latency_ms = 0 is a sentinel, not a real measurement. Querying the view gives you a clean per-stage latency profile:

       stage        | sample_count | avg_latency_ms | p50_latency_ms | p95_latency_ms | p99_latency_ms
--------------------+--------------+----------------+----------------+----------------+----------------
 intent             |           13 |             47 |             46 |             55 |             55
 mandate_issued     |           11 |            132 |            130 |            156 |            156
 mandate_validated  |           10 |             89 |             87 |             95 |             95
 payment_authorized |            9 |            324 |            279 |            441 |            441
 settled            |            8 |            410 |            327 |            424 |            424

The story this table tells:

  • Intent and validation are cheap (sub-100ms p99). Good, these are local checks.
  • Mandate issuance is consistent (~130ms median, tight tail). Good, signing infrastructure is stable.
  • Payment authorization has a fat tail. Median is 279ms but p95 is 441ms. That is your biggest tail risk; it reflects the issuer round trip.
  • Settlement is slow on average (410ms) and dominated by the timeout case in flow f-010. In production you would slice this by settlement provider to see if one rail is dragging the average.

The standard alerting recipe: page when p95_latency_ms for any stage exceeds an SLO for N consecutive minutes. Because the materialized view updates continuously, any external metrics scrape (Prometheus, Datadog) just queries it and exports the latest row.

For more on streaming percentiles and the underlying t-digest algorithm in RisingWave, see Real-time anomaly detection with streaming SQL.

Drop-Off Funnel Analysis

The funnel view counts how many distinct flows ever reached each stage. This is the most compact view of platform health, because a healthy pipeline has roughly equal counts at every stage.

CREATE MATERIALIZED VIEW aap16_funnel_mv AS
SELECT
  stage,
  COUNT(DISTINCT flow_id) AS flows_reached
FROM aap16_flow_events
WHERE stage IN ('intent','mandate_issued','mandate_validated','payment_authorized','settled')
GROUP BY stage;
       stage        | flows_reached
--------------------+---------------
 intent             |            13
 mandate_issued     |            11
 mandate_validated  |            10
 payment_authorized |             9
 settled            |             8

13 flows entered, 8 settled. Where did the other 5 go? A simple self-join exposes the per-step drop-off:

WITH f AS (
  SELECT stage, flows_reached,
    CASE stage
      WHEN 'intent' THEN 1
      WHEN 'mandate_issued' THEN 2
      WHEN 'mandate_validated' THEN 3
      WHEN 'payment_authorized' THEN 4
      WHEN 'settled' THEN 5
    END AS rk
  FROM aap16_funnel_mv
)
SELECT
  a.stage AS from_stage,
  b.stage AS to_stage,
  a.flows_reached AS from_count,
  b.flows_reached AS to_count,
  (a.flows_reached - b.flows_reached) AS dropped,
  ROUND(100.0 * (a.flows_reached - b.flows_reached) / a.flows_reached, 2) AS drop_pct
FROM f a
JOIN f b ON b.rk = a.rk + 1
ORDER BY a.rk;
     from_stage     |      to_stage      | from_count | to_count | dropped | drop_pct
--------------------+--------------------+------------+----------+---------+----------
 intent             | mandate_issued     |         13 |       11 |       2 |    15.38
 mandate_issued     | mandate_validated  |         11 |       10 |       1 |     9.09
 mandate_validated  | payment_authorized |         10 |        9 |       1 |    10.0
 payment_authorized | settled            |          9 |        8 |       1 |    11.11

Now you have a per-edge attrition map. The biggest leak is intent → mandate_issued (15.38%), those are the flows that died in policy checks or got stuck before mandate issuance. The remaining stages each lose around 10% of the flows that reach them, which roughly matches the per-stage error rate we are about to compute.

The drop_pct values would be much cleaner with thousands of flows. With 13 sample flows, a single drop registers as 7-15%; with real volume the same query gives you sharp percentages you can alert on.

A complementary view: per-stage error rate, computed across attempts (not flows).

CREATE MATERIALIZED VIEW aap16_error_rates_mv AS
SELECT
  stage,
  COUNT(*) AS attempts,
  SUM(CASE WHEN success THEN 0 ELSE 1 END) AS failures,
  ROUND(100.0 * SUM(CASE WHEN success THEN 0 ELSE 1 END) / COUNT(*), 2) AS error_rate_pct
FROM aap16_flow_events
WHERE stage IN ('intent','mandate_issued','mandate_validated','payment_authorized','settled')
GROUP BY stage;
       stage        | attempts | failures | error_rate_pct
--------------------+----------+----------+----------------
 intent             |       13 |        1 |           7.69
 mandate_issued     |       11 |        0 |              0
 mandate_validated  |       10 |        1 |           10.0
 payment_authorized |        9 |        1 |          11.11
 settled            |        8 |        1 |           12.5

Read this alongside the funnel: the funnel shows where flows are lost, the error_rate_pct shows how the lost flows died. mandate_issued has 0% errors but the funnel still loses one flow there, those are the stuck flows, not the failed ones. Distinguishing "stuck" from "failed" is exactly the kind of nuance that batch dashboards miss and streaming dashboards catch.

For a related operational pattern, see Real-time dashboards with Kafka, Grafana, and RisingWave.

Powering a Live Dashboard (Postgres-Compatible Queries from Any BI)

Materialized views in RisingWave are queryable like ordinary Postgres tables. That is the unlock for dashboards. Whatever your BI tool of choice (Grafana, Metabase, Apache Superset, or Tableau), point its Postgres driver at RisingWave and the stage-level metrics show up as first-class data sources. No ETL, no batch pipeline, no waiting an hour for the next refresh.

A typical observability dashboard for agentic payments has four panels, each backed by a single query against the views we just built:

Panel 1, Stage latency strip chart

SELECT stage, p50_latency_ms, p95_latency_ms, p99_latency_ms
FROM aap16_stage_latency_mv
ORDER BY CASE stage
  WHEN 'intent' THEN 1 WHEN 'mandate_issued' THEN 2
  WHEN 'mandate_validated' THEN 3 WHEN 'payment_authorized' THEN 4
  WHEN 'settled' THEN 5 END;

Panel 2, Funnel bar chart

SELECT stage, flows_reached FROM aap16_funnel_mv
ORDER BY CASE stage
  WHEN 'intent' THEN 1 WHEN 'mandate_issued' THEN 2
  WHEN 'mandate_validated' THEN 3 WHEN 'payment_authorized' THEN 4
  WHEN 'settled' THEN 5 END;

Panel 3, Per-stage error rate

SELECT stage, error_rate_pct FROM aap16_error_rates_mv;

Panel 4, Currently stuck flows

SELECT flow_id, agent_id, max_stage_rank, cumulative_latency_ms, last_event_time
FROM aap16_flow_status_mv
WHERE NOT has_settled_ok AND NOT has_failed
ORDER BY last_event_time;

Set the dashboard refresh to whatever cadence makes sense (every 5s for a war-room view, every 30s for a normal-day view). RisingWave handles the continuous computation; the dashboard just polls.

When something does go wrong, the on-call engineer's flow looks like this:

  1. The "stuck flows" panel suddenly grows from 2 to 50.
  2. The funnel shows a new gap between mandate_issued and mandate_validated.
  3. The error-rate panel confirms a spike at mandate_validated, say, 40%.
  4. A drilldown query slices that error rate by agent_id to see if the spike is platform-wide or scoped to one agent.

That entire diagnostic path takes seconds, not minutes, because every metric is already materialized. For more on the broader streaming SQL operational toolkit, see How to monitor and debug streaming SQL pipelines in production. If you are just getting started with materialized views as a building block, the PostgreSQL materialized views documentation is a good conceptual primer.

FAQ

1. What are the stages of an agentic payment flow?

A canonical agentic payment flow has six stages: intent (the agent decides to pay), mandate issuance (a signed authorization is generated), mandate validation (the network verifies the mandate), payment authorization (the issuer reserves funds), settlement (the money moves), and outcome (final state). A flow can fail or stall at any stage, which is why per-stage observability matters more than outcome-only metrics.

2. How is end-to-end tracing implemented for agent payments?

Each service that touches a flow emits a structured event with a shared flow_id correlation key, plus the stage, success flag, latency, error code, and timestamp. A streaming engine like RisingWave ingests these events and maintains materialized views that group by flow_id so any flow's complete journey can be reconstructed in real time. This is conceptually identical to OpenTelemetry tracing, but expressed in SQL and queryable from any Postgres client.

3. What metrics matter for agentic payment observability?

Four metric families: per-stage latency (p50/p95/p99), per-stage error rate (broken out by error code), funnel drop-off (counts of distinct flows reaching each stage), and time-in-stage for incomplete flows. Together they answer "is the platform healthy", "where is it degrading", and "which flows are stuck right now".

4. How does RisingWave power live observability dashboards?

RisingWave ingests payment-flow events from Kafka, Pulsar, or CDC sources, then continuously updates materialized views aggregating latency, error rates, and funnel counts. Because RisingWave is Postgres wire-compatible, any BI tool (Grafana, Metabase, Superset, Tableau) can query these views with standard SQL. Dashboards refresh as fast as you poll, and the underlying state is always sub-second fresh, with no batch jobs or external ETL.

Conclusion

Agentic payments break the assumptions that traditional payment observability was built around. A single flow now spans intent, mandate, validation, authorization, settlement, and outcome, and every stage has its own latency and error budget. If you only watch the final outcome, you discover problems hours after they started; if you only watch service-level metrics, you cannot tell which flows are affected.

Streaming SQL gives you a clean separation: emit one event per stage transition, ingest into RisingWave, and let materialized views maintain the per-flow trace, per-stage latency percentiles, funnel counts, and error rates continuously. The result is a Postgres-compatible observability layer that any dashboard can query, with sub-second freshness and no external pipelines to maintain.

Ready to build a live observability layer for agentic payments? Try RisingWave Cloud free →

Join our Slack community to compare notes with other teams building real-time payment infrastructure.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.