Real-Time Feature Store for AI Agents: Beyond Batch ML Pipelines

Real-Time Feature Store for AI Agents: Beyond Batch ML Pipelines

·

14 min read

Background: Why Feature Stores Existed

The feature store concept emerged from a specific problem in production machine learning. Data scientists compute features for model training in notebooks or Spark jobs. Production engineers re-implement those same features in application code for inference. The two implementations diverge over time. The training data reflects what the model was optimized on; the inference data reflects what the model actually sees. This gap, training-serving skew, is one of the most common silent failures in production ML.

Feature stores solved this by creating a shared repository for feature definitions and values. Tecton, which became part of Databricks in 2025, built a managed platform where data engineers define features once using Spark or Flink transforms, and those feature values flow into both an offline store (for training) and an online store (for low-latency inference). Feast offered an open-source alternative that coordinated between existing compute engines and storage systems.

These tools were built for a specific world: batch model training, periodic feature refreshes, and inference at human-scale latency tolerances. A fraud model that accepts a 30-second latency can afford to wait for features computed in the last hourly batch. A recommendation model with hundreds of milliseconds of budget can tolerate features that are a few minutes stale.

That world still exists. But it is no longer the only world.

The Agentic Era Changes the Requirements

AI agents operate on a fundamentally different cadence. An agent is not a model waiting passively for a request. It is a running process that observes the environment, reasons about it, and takes actions, repeatedly, in a loop. At each observation step, the agent retrieves context and features to inform its next decision.

The difference from traditional ML inference is not just latency. It is the decision frequency and the consequence of staleness. A recommendation model runs once per page load. A fraud detection agent might evaluate dozens of signals per transaction and take immediate action: block the transaction, request additional verification, or pass it through. If the features it reads are stale, every decision downstream is compromised.

Three specific properties separate agent features from ML training features:

Freshness is measured in seconds, not hours. A user's transaction velocity in the last five minutes is a completely different signal from their average hourly rate over the past month. Both are valid features. Only one is useful for an agent making a decision about a transaction happening right now.

Features are queried per decision, not per batch. A training pipeline can afford to compute features for ten million training examples overnight. An agent calls SELECT * FROM user_spend_features WHERE user_id = $1 before every decision. Point lookup performance and concurrency matter more than bulk throughput.

Feature definitions change with agent behavior. Agents are developed iteratively. When the agent's decision logic changes, the features it needs change too. A system that requires coordinating multiple Spark jobs, Airflow DAGs, and Redis cache invalidation to update a feature definition is too slow to keep up.

Streaming SQL materialized views address all three requirements. You define features as SQL queries over continuous event streams. The database maintains the results incrementally as new events arrive. Agents query fresh, pre-computed feature values with standard SQL.

Three Categories of Agent Features

Agent features cluster into three categories, each with a natural SQL implementation pattern.

Category 1: User Behavior Features (Sliding Windows)

Behavioral features capture what a user has done recently. They answer questions like: How many transactions has this user made in the last hour? What is their total spend in the last 30 minutes? How many distinct merchants have they visited today?

The TUMBLE window function in RisingWave is the right tool for computing these efficiently. TUMBLE creates fixed, non-overlapping time buckets. Each bucket closes at its end time and is never updated again, which means RisingWave can store only the current open bucket plus a bounded number of completed buckets, keeping memory use bounded regardless of how long the stream runs.

-- Tested against RisingWave v2.8.0

-- Source: transaction events
CREATE TABLE transactions (
    transaction_id  VARCHAR,
    user_id         VARCHAR,
    account_id      VARCHAR,
    amount          DECIMAL,
    merchant_id     VARCHAR,
    merchant_cat    VARCHAR,
    tx_type         VARCHAR,   -- 'debit', 'credit', 'transfer'
    is_international BOOLEAN,
    event_time      TIMESTAMPTZ
);

-- User behavior: rolling 1-hour spend and transaction velocity
CREATE MATERIALIZED VIEW user_spend_features AS
SELECT
    user_id,
    window_start,
    window_end,
    SUM(amount)                                     AS spend_1h,
    COUNT(*)                                        AS tx_count_1h,
    COUNT(DISTINCT merchant_id)                     AS unique_merchants_1h,
    COUNT(*) FILTER (WHERE is_international = TRUE) AS intl_tx_count_1h,
    MAX(amount)                                     AS max_single_tx_1h
FROM TUMBLE(transactions, event_time, INTERVAL '1' HOUR)
GROUP BY user_id, window_start, window_end;

An agent querying this view before a fraud decision gets all spending behavior signals for the current hour window in a single round trip:

SELECT
    spend_1h,
    tx_count_1h,
    unique_merchants_1h,
    intl_tx_count_1h,
    max_single_tx_1h
FROM user_spend_features
WHERE user_id = 'user-4812'
  AND window_end > NOW() - INTERVAL '1' HOUR
ORDER BY window_end DESC
LIMIT 1;

Category 2: Entity State Features (Latest Value Per Entity)

State features capture the current condition of an entity: an account balance, a user's current risk tier, whether a merchant account is flagged. Unlike behavioral features, which aggregate over a time window, state features reflect the cumulative result of all history.

-- Entity state: current account balance derived from all credit and debit events
CREATE MATERIALIZED VIEW account_state AS
SELECT
    account_id,
    SUM(
        CASE
            WHEN tx_type = 'credit' THEN amount
            ELSE -amount
        END
    )                                               AS balance,
    COUNT(*)                                        AS total_tx_count,
    MAX(event_time)                                 AS last_activity_at,
    COUNT(DISTINCT user_id)                         AS linked_users
FROM transactions
GROUP BY account_id;

This view maintains one row per account and updates incrementally with each new transaction. An agent querying balance before authorizing a withdrawal gets a result that already reflects every prior transaction, with no aggregation cost at query time.

For user-level state features that combine behavioral signals with profile attributes, you can join the state view with a profile table:

-- User baseline: historical spending patterns used for comparison
CREATE TABLE user_baselines (
    user_id             VARCHAR PRIMARY KEY,
    avg_hourly_spend    DECIMAL,
    baseline_tx_count   INT,
    computed_at         TIMESTAMPTZ
);

Category 3: Anomaly and Risk Features (Deviation from Baseline)

Risk features flag unusual behavior by comparing current activity against a stored baseline. This is where streaming SQL shows its clearest advantage over batch feature pipelines: the anomaly is detected as it happens, not in the next batch run.

-- Risk: velocity spike detection
-- Flags users whose current-hour transaction count is 3x their baseline
CREATE MATERIALIZED VIEW velocity_anomalies AS
SELECT
    u.user_id,
    u.tx_count_1h,
    u.spend_1h,
    b.baseline_tx_count,
    u.tx_count_1h::DECIMAL / NULLIF(b.baseline_tx_count, 0) AS spike_ratio,
    u.window_start,
    u.window_end
FROM user_spend_features u
JOIN user_baselines b ON u.user_id = b.user_id
WHERE u.tx_count_1h > b.baseline_tx_count * 3
  AND u.window_end > NOW() - INTERVAL '1' HOUR;

An agent can query this view to check whether the current user is already in a flagged spike state before evaluating a new transaction:

SELECT
    spike_ratio,
    tx_count_1h,
    baseline_tx_count
FROM velocity_anomalies
WHERE user_id = 'user-4812';
-- Returns a row only if the user is currently in a velocity spike
-- Returns no rows if behavior is within normal range

This is a pre-filter, not the full fraud decision. But knowing that a user has already triggered a velocity anomaly before evaluating the new transaction changes the weight the agent gives to every other signal.

Agent Integration Pattern

The integration is deliberately simple. The agent calls a standard PostgreSQL query before each decision. No SDK, no proprietary API, no in-memory cache to manage.

import psycopg2

# RisingWave speaks the PostgreSQL wire protocol
# psycopg2, asyncpg, SQLAlchemy, and any JDBC driver work unchanged
conn = psycopg2.connect(
    host="localhost",
    port=4566,
    dbname="dev",
    user="root"
)

def get_agent_features(user_id: str, account_id: str) -> dict:
    """
    Fetch all features needed for a fraud decision in one round trip.
    Each query hits a pre-computed materialized view; no aggregation at query time.
    """
    with conn.cursor() as cur:
        # User spend features: behavioral signals from current hour
        cur.execute(
            """
            SELECT spend_1h, tx_count_1h, unique_merchants_1h,
                   intl_tx_count_1h, max_single_tx_1h
            FROM user_spend_features
            WHERE user_id = %s
              AND window_end > NOW() - INTERVAL '1 hour'
            ORDER BY window_end DESC
            LIMIT 1
            """,
            (user_id,)
        )
        spend_row = cur.fetchone()

        # Account state: current balance and activity
        cur.execute(
            "SELECT balance, total_tx_count, last_activity_at FROM account_state WHERE account_id = %s",
            (account_id,)
        )
        account_row = cur.fetchone()

        # Velocity anomaly: is this user currently in a spike?
        cur.execute(
            "SELECT spike_ratio FROM velocity_anomalies WHERE user_id = %s",
            (user_id,)
        )
        anomaly_row = cur.fetchone()

    return {
        "spend_1h":           spend_row[0] if spend_row else 0,
        "tx_count_1h":        spend_row[1] if spend_row else 0,
        "unique_merchants_1h": spend_row[2] if spend_row else 0,
        "intl_tx_count_1h":   spend_row[3] if spend_row else 0,
        "max_single_tx_1h":   spend_row[4] if spend_row else 0,
        "account_balance":    account_row[0] if account_row else None,
        "total_tx_count":     account_row[1] if account_row else 0,
        "velocity_spike_ratio": float(anomaly_row[0]) if anomaly_row else 0.0
    }

# The agent calls this before each fraud evaluation
features = get_agent_features(user_id="user-4812", account_id="acct-0081")

# Features are fresh to within seconds of the latest transaction event
# The agent passes them to its decision logic or LLM context

The three queries above can be issued as a single batch using execute_values or converted into a single UNION ALL query if round-trip latency is a concern. In practice, materialized view point lookups return in single-digit milliseconds, so three sequential queries typically complete well under 20 milliseconds.

Why This Is Better Than a Traditional Feature Store

Traditional feature store architectures separate computation from storage. Spark or Flink computes features and writes them to an offline store (Iceberg, BigQuery) and an online store (Redis, DynamoDB). The online store handles low-latency serving. The offline store handles training data generation. Feast or a similar registry coordinates metadata.

This separation creates three operational burdens that the streaming SQL approach eliminates.

Training-serving skew. When offline and online compute pipelines are separate systems, subtle differences in how they handle nulls, timezones, or edge cases cause the feature values to diverge. An ML model trained on offline features then served with different online values produces worse results in production than in evaluation. With streaming SQL, you define one materialized view. The same SQL expression produces the online feature value and the value that gets written to an Iceberg training dataset via a sink.

Infrastructure sprawl. A traditional real-time feature store requires: a Kafka cluster, a Flink or Spark cluster, a Redis or DynamoDB instance for online serving, a data warehouse for offline storage, and an orchestrator (Airflow) to coordinate batch refreshes. Each component has its own failure modes, scaling characteristics, and operational overhead. RisingWave consolidates the streaming compute layer and the online serving layer into a single PostgreSQL-compatible system.

Feature update latency. Updating a feature definition in Tecton or Feast requires modifying a pipeline, testing it, backfilling historical values, and deploying to production. In a streaming SQL setup, you DROP MATERIALIZED VIEW and re-create it with the updated definition. RisingWave begins recomputing the view immediately. For agents that need to iterate on feature logic quickly, this difference in development velocity is material.

The trade-off is that streaming SQL materialized views do not come with a built-in feature registry for governance and discovery, and they do not handle point-in-time joins for training data generation the way purpose-built feature stores do. For teams with mature MLOps practices and strict governance requirements, pairing RisingWave with Feast as a metadata catalog makes sense. For most agentic applications, where the primary need is fresh, reliable feature serving at low latency, streaming SQL is the simpler and faster path.

FAQ

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is a real-time feature store for AI agents?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A real-time feature store for AI agents is a system that computes and serves feature values with sub-second freshness, so that agents can retrieve accurate context before each decision. Unlike traditional ML feature stores that refresh features on batch schedules (hourly or daily), a real-time feature store maintains feature values continuously using streaming computation. RisingWave implements this using streaming SQL materialized views that update incrementally as new events arrive, and serves those features over a PostgreSQL-compatible interface."
      }
    },
    {
      "@type": "Question",
      "name": "How does TUMBLE differ from a regular GROUP BY for feature engineering?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A regular GROUP BY aggregates all rows matching a condition at query time, which is expensive because it scans raw event data. TUMBLE in RisingWave partitions a stream into fixed, non-overlapping time windows (for example, 1-hour buckets) and pre-computes aggregates for each bucket incrementally as events arrive. Once a window closes, its aggregate result is stored and never recomputed. Queries against TUMBLE-based materialized views read pre-computed bucket results rather than scanning raw event logs, returning results in milliseconds regardless of how many historical events exist."
      }
    },
    {
      "@type": "Question",
      "name": "What is training-serving skew and how does streaming SQL prevent it?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Training-serving skew occurs when the feature computation logic used during model training differs from the logic used during inference, causing the model to see different feature distributions at serving time than it was optimized for. This is a common source of model degradation in production. Streaming SQL prevents it by using a single materialized view definition for both purposes: the live view serves online inference, and a sink from the same view writes to an offline Iceberg or PostgreSQL table for training data. The SQL expression is identical for both paths, so there is no skew by construction."
      }
    },
    {
      "@type": "Question",
      "name": "Can I use this approach with LangChain, CrewAI, or AutoGen?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. All three frameworks support custom tool definitions, and querying a PostgreSQL database is a standard pattern in all of them. Because RisingWave implements the PostgreSQL wire protocol, you connect with psycopg2, asyncpg, or SQLAlchemy exactly as you would with a PostgreSQL database. Wrap the feature queries in a tool function and register it with your agent framework. The agent calls the tool to retrieve features before each decision, and the tool returns pre-computed results from the streaming materialized views."
      }
    },
    {
      "@type": "Question",
      "name": "How does RisingWave handle feature freshness guarantees?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RisingWave propagates event changes through materialized views using incremental computation: only the delta caused by a new event is recomputed, not the entire view. End-to-end freshness from event ingestion to materialized view update is typically under one second for most workloads. The exact latency depends on query complexity, event throughput, and cluster configuration. For TUMBLE-based window features, events within the current open window are reflected immediately. For state-based features like account balances, each new transaction updates the balance within milliseconds of arriving."
      }
    }
  ]
}
</script>

What is a real-time feature store for AI agents?

A real-time feature store for AI agents computes and serves feature values with sub-second freshness so agents can retrieve accurate context before each decision. Unlike traditional ML feature stores that refresh on batch schedules, it maintains feature values continuously using streaming computation. RisingWave implements this with streaming SQL materialized views served over a PostgreSQL-compatible interface.

How does TUMBLE differ from a regular GROUP BY?

A regular GROUP BY scans raw event data at query time, which is expensive. TUMBLE in RisingWave partitions a stream into fixed, non-overlapping time windows and pre-computes aggregates for each bucket incrementally as events arrive. Once a window closes, its aggregate result is stored and never recomputed. Queries read pre-computed results, returning in milliseconds regardless of historical event volume.

What is training-serving skew and how does streaming SQL prevent it?

Training-serving skew occurs when the feature computation logic used during model training differs from the logic at inference time. Streaming SQL prevents it by using a single materialized view definition for both paths: the live view serves online inference, and a sink from the same view writes to an offline Iceberg table for training. The SQL expression is identical, so there is no skew by construction.

Can I use this approach with LangChain, CrewAI, or AutoGen?

Yes. All three frameworks support custom tool definitions. Wrap the feature queries in a tool function using psycopg2 or asyncpg and register it with your agent. RisingWave's PostgreSQL wire protocol compatibility means the connection code is identical to any PostgreSQL integration.

How does RisingWave handle feature freshness?

RisingWave propagates changes through materialized views using incremental computation: only the delta from a new event is processed, not the entire view. End-to-end freshness is typically under one second for most workloads. For TUMBLE-based window features, events within the current open window are reflected immediately. For state features like account balances, each new transaction updates the result within milliseconds.

Conclusion

Traditional feature stores were engineered for a specific workload: batch model training with periodic refreshes and inference tolerances measured in seconds. AI agents require something different. They query features before every decision, they need freshness measured in seconds or less, and they need a serving interface simple enough to call from a tool function without specialized SDKs.

Streaming SQL materialized views are the right primitive for this workload. Three patterns cover the majority of agent feature needs: TUMBLE windows for behavioral velocity features, simple aggregations for entity state features, and join-based views for anomaly and risk features. All three are defined in SQL, maintained continuously by RisingWave, and served over the PostgreSQL wire protocol.

The key insight is that the feature store for the agentic era does not need to be a separate system. When the streaming database and the serving layer are the same PostgreSQL-compatible system, the feature pipeline becomes as simple as a CREATE MATERIALIZED VIEW statement. Agents query it like any other database table. Features are always fresh. No batch jobs, no Redis cache, no synchronization logic.


Try it on RisingWave Cloud. Free tier available, no credit card required. Sign up here.

Have questions about real-time feature engineering? Join our Slack community to connect with other data and ML engineers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.