How to Reduce LLM Token Costs with Pre-Computed Materialized Views

How to Reduce LLM Token Costs with Pre-Computed Materialized Views

You can reduce LLM token costs by 10x to 100x by replacing raw event streams in your prompts with pre-computed materialized views. Instead of querying and formatting thousands of raw rows at inference time, RisingWave maintains compact, always-current summaries that your application can inject into prompts with a single row lookup.

The LLM Context Stuffing Problem

Every time your application calls an LLM, it spends tokens on context. For user-facing AI features, that context typically includes information about who the user is: their purchase history, behavior patterns, account status, open support tickets, and more. The naive approach is to query the raw tables and include the results directly in the prompt.

Here is what that looks like in practice. Suppose you want to give an AI assistant context about a customer before it drafts a support reply:

-- Naive approach: send all raw rows to the LLM
SELECT purchase_id, user_id, product_id, product_name,
       category, amount, quantity, purchased_at
FROM purchases
WHERE user_id = 'user_42';
 purchase_id | user_id | product_id |      product_name       | category  | amount | quantity |       purchased_at
-------------+---------+------------+-------------------------+-----------+--------+----------+---------------------------
 O001        | user_42 | P001       | AI Pro Subscription     | software  |  49.99 |        1 | 2026-03-28 09:15:00+00:00
 O003        | user_42 | P003       | Stream Processing Book  | education |  39.99 |        2 | 2026-03-30 11:05:00+00:00
 O002        | user_42 | P002       | Data Engineering Course | education | 199.00 |        1 | 2026-03-29 14:22:00+00:00

This user has three orders. Multiply that by hundreds of fields across purchases, pageviews, sessions, support tickets, and account events, and you are pasting thousands of tokens of raw tabular data into every prompt. The LLM does not need the raw rows. It needs the answer to: "What kind of customer is this, and what do they need right now?"

A materialized view computes that answer once and keeps it current. Your prompt then becomes a single row lookup instead of a multi-table join executed at inference time.

What Are Pre-Computed Materialized Views?

A materialized view is a query whose result is stored and maintained automatically. In a traditional database, materialized views are refreshed on a schedule. In RisingWave, they are maintained incrementally and continuously: when a source row changes, only the affected output rows are recomputed and updated. The view is always current.

For LLM cost reduction, the value is straightforward: you pay to compute the aggregation once (in the streaming layer), not once per prompt (at inference time). Your application retrieves a compact, pre-formatted summary and injects it into the context window. The LLM gets exactly the signal it needs, with no wasted tokens.

Setting Up the Example Schema

The examples below use three source tables representing an e-commerce and SaaS platform. These are append-only tables backed by a Kafka source in production, but for demonstration purposes we create them directly:

CREATE TABLE llm_purchases (
    purchase_id   VARCHAR,
    user_id       VARCHAR,
    product_id    VARCHAR,
    product_name  VARCHAR,
    category      VARCHAR,
    amount        DECIMAL,
    quantity      INT,
    purchased_at  TIMESTAMPTZ,
    PRIMARY KEY (purchase_id)
);

CREATE TABLE llm_pageviews (
    view_id          VARCHAR,
    user_id          VARCHAR,
    page_url         VARCHAR,
    page_category    VARCHAR,
    session_id       VARCHAR,
    viewed_at        TIMESTAMPTZ,
    time_on_page_sec INT,
    PRIMARY KEY (view_id)
);

CREATE TABLE llm_support_tickets (
    ticket_id   VARCHAR,
    user_id     VARCHAR,
    subject     VARCHAR,
    status      VARCHAR,
    priority    VARCHAR,
    created_at  TIMESTAMPTZ,
    resolved_at TIMESTAMPTZ,
    PRIMARY KEY (ticket_id)
);

In production these tables would be created with a connector clause pointing to Kafka or a CDC source. For this walkthrough, we insert sample rows directly.

Example 1: Customer Purchase Summary

The goal here is to collapse a user's entire purchase history into a single compact row that tells the LLM who this customer is as a buyer.

CREATE MATERIALIZED VIEW llm_mv_purchase_summary AS
SELECT
    user_id,
    COUNT(purchase_id)                                       AS total_orders,
    SUM(amount * quantity)                                   AS total_spent,
    MAX(purchased_at)                                        AS last_purchase_at,
    COUNT(DISTINCT category)                                 AS categories_purchased,
    STRING_AGG(DISTINCT category, ', ' ORDER BY category)   AS categories
FROM llm_purchases
GROUP BY user_id;

After inserting seven sample purchases across three users, the view holds:

 user_id | total_orders | total_spent |     last_purchase_at      | categories_purchased |     categories
---------+--------------+-------------+---------------------------+----------------------+---------------------
 user_77 |            2 |     3048.99 | 2026-03-31 08:10:00+00:00 |                    1 | software
 user_42 |            3 |      328.97 | 2026-03-30 11:05:00+00:00 |                    2 | education, software
 user_15 |            2 |      298.00 | 2026-03-31 11:20:00+00:00 |                    2 | education, software

Three orders for user_42 collapsed into one row with five columns. That row contains everything an LLM needs to understand this customer's purchase behavior. No raw timestamps, no redundant product IDs, no repeated user_id values. The token count for the raw three rows was approximately 120 tokens. The materialized view summary is approximately 12 tokens of signal. That is a 10x reduction for a user with just three purchases. The ratio improves dramatically as purchase history grows.

Example 2: User Behavior Profile

Behavioral signals are even more expensive to include raw. A user with 100 page views generates 100 rows of tabular data. A materialized view reduces that to a single row describing their engagement pattern:

CREATE MATERIALIZED VIEW llm_mv_user_behavior AS
SELECT
    user_id,
    COUNT(view_id)                                              AS total_pageviews,
    COUNT(DISTINCT session_id)                                  AS sessions,
    COUNT(DISTINCT page_category)                               AS categories_visited,
    SUM(time_on_page_sec)                                       AS total_time_sec,
    STRING_AGG(DISTINCT page_category, ', ' ORDER BY page_category) AS page_categories,
    MAX(viewed_at)                                              AS last_active_at
FROM llm_pageviews
GROUP BY user_id;

With ten sample pageviews across three users:

 user_id | total_pageviews | sessions | categories_visited | total_time_sec |        page_categories         |      last_active_at
---------+-----------------+----------+--------------------+----------------+--------------------------------+---------------------------
 user_42 |               5 |        2 |                  4 |           1080 | blog, docs, pricing, use-cases | 2026-03-29 14:10:00+00:00
 user_77 |               3 |        1 |                  3 |            930 | contact, docs, pricing         | 2026-03-30 15:30:00+00:00
 user_15 |               2 |        1 |                  2 |            570 | blog, docs                     | 2026-03-31 10:45:00+00:00

For user_77, the view captures everything meaningful: three sessions, 930 seconds of engagement, and the specific page categories visited (including /contact and /pricing, which suggests high purchase intent). An LLM can use this to tailor its response without processing any raw event log.

Example 3: Entity State View (Support Tickets)

For customer-facing AI features like chatbots and support copilots, the LLM needs to know whether a user has open issues. Querying the raw ticket table at inference time gives you one row per ticket. A materialized view gives you one row per user:

CREATE MATERIALIZED VIEW llm_mv_support_context AS
SELECT
    user_id,
    COUNT(ticket_id)                                           AS total_tickets,
    COUNT(CASE WHEN status = 'open' THEN 1 END)               AS open_tickets,
    COUNT(CASE WHEN status = 'resolved' THEN 1 END)           AS resolved_tickets,
    STRING_AGG(
        CASE WHEN status = 'open' THEN subject ELSE NULL END,
        '; '
    )                                                          AS open_issues,
    MAX(created_at)                                            AS latest_ticket_at
FROM llm_support_tickets
GROUP BY user_id;

Output:

 user_id | total_tickets | open_tickets | resolved_tickets |              open_issues              |     latest_ticket_at
---------+---------------+--------------+------------------+---------------------------------------+---------------------------
 user_42 |             2 |            0 |                2 |                                       | 2026-03-29 09:00:00+00:00
 user_77 |             1 |            1 |                0 | Enterprise SSO configuration help     | 2026-03-30 17:00:00+00:00
 user_15 |             1 |            1 |                0 | UDF execution error on Python runtime | 2026-03-31 12:00:00+00:00

The LLM now knows in one row that user_77 has an open ticket about SSO configuration. If the user asks anything about logging in, the AI can acknowledge that issue proactively.

Example 4: Combined Context View

You can join the per-user materialized views into a single context row that covers purchases, behavior, and support state together:

CREATE MATERIALIZED VIEW llm_mv_user_context AS
SELECT
    ps.user_id,
    ps.total_orders,
    ps.total_spent,
    ps.categories                AS purchase_categories,
    ps.last_purchase_at,
    ub.total_pageviews,
    ub.sessions,
    ub.page_categories,
    ub.last_active_at
FROM llm_mv_purchase_summary ps
JOIN llm_mv_user_behavior ub ON ps.user_id = ub.user_id;
 user_id | total_orders | total_spent | purchase_categories |     last_purchase_at      | total_pageviews | sessions |        page_categories         |      last_active_at
---------+--------------+-------------+---------------------+---------------------------+-----------------+----------+--------------------------------+---------------------------
 user_15 |            2 |      298.00 | education, software | 2026-03-31 11:20:00+00:00 |               2 |        1 | blog, docs                     | 2026-03-31 10:45:00+00:00
 user_77 |            2 |     3048.99 | software            | 2026-03-31 08:10:00+00:00 |               3 |        1 | contact, docs, pricing         | 2026-03-30 15:30:00+00:00
 user_42 |            3 |      328.97 | education, software | 2026-03-30 11:05:00+00:00 |               5 |        2 | blog, docs, pricing, use-cases | 2026-03-29 14:10:00+00:00

At inference time, your application does a single point lookup:

SELECT *
FROM llm_mv_user_context
JOIN llm_mv_support_context USING (user_id)
WHERE user_id = $1;

This returns one row in single-digit milliseconds. You format it as a short string and inject it into your prompt:

User: user_77. Orders: 2. Total spent: $3048.99. Interests: software.
Sessions: 1. Pages visited: contact, docs, pricing. Open tickets: 1.
Issues: Enterprise SSO configuration help

That entire context snippet is approximately 40 tokens. The equivalent raw data spanning purchases, pageviews, and tickets would be hundreds of rows and thousands of tokens.

Token Reduction at Scale

The table below shows how token consumption changes as event volume grows, comparing raw row injection against a materialized view lookup:

User events (raw rows)Raw context tokens (est.)MV context tokensReduction
10 rows~250~308x
100 rows~2,500~3083x
1,000 rows~25,000~30833x
10,000 rows~250,000 (exceeds context window)~30impossible otherwise

Token estimates assume roughly 25 tokens per structured row (column names, values, whitespace). The materialized view output stays constant regardless of event history depth because it is already aggregated.

For a production system processing 10,000 daily active users with GPT-4o at $2.50 per million input tokens: reducing context from 2,500 tokens to 30 tokens per call saves approximately $2.47 per user per call. At 5 AI calls per user per day, that is $12.35 per user per day, or $123,500 per day across your user base. The streaming infrastructure to maintain these materialized views costs a fraction of that.

How RisingWave Keeps Views Current

The key difference between a traditional materialized view and a RisingWave materialized view is the update mechanism. In PostgreSQL, a materialized view is a snapshot refreshed with REFRESH MATERIALIZED VIEW. In RisingWave, the view is a continuously maintained incremental computation: when a new purchase row arrives, only the affected user's summary row is updated. No full recomputation. No scheduled refresh job.

This matters for LLM applications because context accuracy is a freshness problem. If a user opens a support ticket and your context view still shows open_tickets: 0, your AI gives incorrect guidance. With RisingWave, the view reflects the ticket within seconds of it being created.

Source insert (new support ticket for user_77)
    --> RisingWave detects row change
    --> Updates llm_mv_support_context for user_77 only
    --> View row is ready within seconds
    --> Next prompt call reads fresh state

This is the streaming database model: compute once, serve many. The aggregation work happens in the background, incrementally, not at query time.

Integrating with Your LLM Application

Once your materialized views are defined, connecting them to your application is straightforward. Here is a Python example using psycopg2 (RisingWave is PostgreSQL-compatible):

import psycopg2
import openai

conn = psycopg2.connect(
    host="localhost", port=4566,
    user="root", dbname="dev"
)

def build_user_context(user_id: str) -> str:
    with conn.cursor() as cur:
        cur.execute("""
            SELECT
                uc.total_orders,
                uc.total_spent,
                uc.purchase_categories,
                uc.sessions,
                uc.page_categories,
                COALESCE(sc.open_tickets, 0)  AS open_tickets,
                sc.open_issues
            FROM llm_mv_user_context uc
            LEFT JOIN llm_mv_support_context sc
                   ON uc.user_id = sc.user_id
            WHERE uc.user_id = %s
        """, (user_id,))
        row = cur.fetchone()

    if not row:
        return "No user history found."

    orders, spent, categories, sessions, pages, open_tickets, issues = row

    context = (
        f"Orders: {orders}. Total spent: ${spent:.2f}. "
        f"Interested in: {categories}. "
        f"Sessions: {sessions}. Pages visited: {pages}. "
        f"Open support tickets: {open_tickets}."
    )
    if issues:
        context += f" Current issues: {issues}."
    return context

def respond_to_user(user_id: str, user_message: str) -> str:
    context = build_user_context(user_id)

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a helpful support assistant for a SaaS platform. "
                f"Customer context: {context}"
            )},
            {"role": "user", "content": user_message}
        ]
    )
    return response.choices[0].message.content

The build_user_context function executes a single indexed query. It returns in under 5ms. The context string is typically 30 to 60 tokens. There is no aggregation happening at inference time because RisingWave already did that work when the events arrived.

Designing Materialized Views for LLM Context

A few principles help when designing these views:

Prefer aggregates over lists. total_orders: 3 is more token-efficient than listing all three order IDs. The LLM rarely needs individual event identifiers; it needs the patterns they represent.

Use STRING_AGG for category signals. Converting a set of category values to a comma-separated string is compact and readable. The LLM understands education, software immediately.

Keep the view narrow. Each view should answer one question: "What has this user bought?" "How are they engaging?" "Do they have issues?" Joining views at query time keeps individual views fast to maintain.

Avoid injecting raw timestamps where possible. Instead of 2026-03-30 11:05:00+00:00, derive a human-readable signal: last_purchase_days_ago: 2. You can compute this in the view using EXTRACT(epoch FROM NOW() - MAX(purchased_at)) / 86400 or compute it at query time as a derived column.

Version your views with a prefix. Using a consistent prefix like llm_mv_ makes it easy to identify views that feed LLM context, separate from operational views used by dashboards or APIs.

Comparison with Alternative Approaches

ApproachFreshnessInference latencyToken costComplexity
Raw SQL query at inferenceAlways fresh50-500ms (grows with history)Very highLow
Batch-computed summary tableHours old1-5msLowMedium
Redis cache of summariesMinutes to hours old1msLowMedium-high
RisingWave materialized viewsSeconds1-5msLowLow
Vector search over eventsApproximate freshness20-100msMediumHigh

Redis caching is a common alternative. The tradeoff is that a cache requires explicit invalidation logic: when a new purchase arrives, your application code must know to update the cache. RisingWave materialized views handle this automatically as part of the streaming computation graph. There is no application-side cache invalidation to maintain.

Batch-computed summaries (a scheduled Spark or dbt job) are cheaper to operate but introduce staleness that can degrade AI response quality. A user who just made a high-value purchase still appears as a low-value customer until the next batch run.

What About Product-Level Context?

The same pattern applies to product and inventory state. If your LLM answers questions about product availability or recent purchase trends, a product-level materialized view keeps that context compact:

CREATE MATERIALIZED VIEW llm_mv_product_state AS
SELECT
    p.product_id,
    p.product_name,
    p.category,
    p.unit_price,
    p.stock_qty,
    COUNT(DISTINCT pu.user_id)    AS unique_buyers,
    SUM(pu.quantity)              AS units_sold,
    MAX(pu.purchased_at)          AS last_sold_at
FROM llm_products p
LEFT JOIN llm_purchases pu ON p.product_id = pu.product_id
GROUP BY p.product_id, p.product_name, p.category, p.unit_price, p.stock_qty;

Output:

 product_id |         product_name          | category  | unit_price | stock_qty | unique_buyers | units_sold |       last_sold_at
------------+-------------------------------+-----------+------------+-----------+---------------+------------+---------------------------
 P002       | Data Engineering Course       | education |        199 |       500 |             2 |          2 | 2026-03-31 11:20:00+00:00
 P001       | AI Pro Subscription           | software  |      49.99 |      9999 |             2 |          2 | 2026-03-31 08:10:00+00:00
 P004       | RisingWave Enterprise License | software  |       2999 |        50 |             1 |          1 | 2026-03-30 16:40:00+00:00
 P005       | Vector DB Addon               | software  |         99 |       200 |             1 |          1 | 2026-03-31 10:55:00+00:00
 P003       | Stream Processing Book        | education |      39.99 |       300 |             1 |          2 | 2026-03-30 11:05:00+00:00

When a user asks "Is the RisingWave Enterprise License still available?", the LLM can retrieve the product row in milliseconds and answer with current stock and sales velocity. No scan over the full order history is needed.

Getting Started with RisingWave

RisingWave is open source under Apache 2.0. You can run it locally with Docker:

docker run -it --pull=always -p 4566:4566 -p 5691:5691 \
  risingwavelabs/risingwave:latest playground

Then connect with any PostgreSQL-compatible client:

psql -h localhost -p 4566 -U root -d dev

From there, the SQL in this article runs as written. See the RisingWave quickstart guide to connect Kafka sources, set up CDC from PostgreSQL or MySQL, and configure sinks to downstream systems.

For production deployments, RisingWave Cloud offers a fully managed option with auto-scaling, monitoring, and SLA-backed uptime. The same SQL works in both environments.

FAQ

How much can materialized views actually reduce my LLM token costs?

The reduction depends on how many raw rows you would otherwise send to the LLM. For a user with 10 purchases, you might see an 8x to 10x reduction. For a user with 1,000 historical events across purchases, pageviews, and sessions, the reduction reaches 100x or more. The materialized view summary stays at roughly the same token count regardless of how deep the event history goes, because it is an aggregation. The raw data grows linearly with events; the summary does not.

Are RisingWave materialized views always current, or do they lag?

RisingWave materialized views are maintained incrementally as new data arrives. In practice, the view reflects new events within one to five seconds under normal load. This is not the same as a traditional database REFRESH MATERIALIZED VIEW, which runs on a schedule. RisingWave updates only the affected rows in the view when source data changes, making the update both fast and continuous. For LLM applications, this means context is nearly always current.

What is the difference between using a materialized view and caching the query result in Redis?

A Redis cache requires your application to manage invalidation. When a new event arrives, your code must know which cache keys to evict and when to regenerate them. This adds coupling between your event processing logic and your cache management logic. A RisingWave materialized view handles this automatically: the streaming computation graph keeps the view updated as part of the same pipeline that processes incoming events. There is no application-side cache logic to write or maintain. The view is always the right answer.

Can I use this pattern with any LLM provider?

Yes. The materialized views are plain SQL tables queryable over PostgreSQL wire protocol. Your application retrieves a row and formats it as a string to inject into the prompt. That string can go into any LLM API: OpenAI, Anthropic, Google Gemini, Mistral, or any open model served via a compatible API. The RisingWave side is provider-agnostic.

Do I need to change how I write prompts?

You do not need to change your prompt templates fundamentally. The difference is what you inject into them. Instead of formatting a list of raw rows as a markdown table and pasting it into the system prompt, you read one row from the materialized view and format it as a short descriptive sentence. The LLM handles both formats, but the short sentence version uses far fewer tokens and often produces better results because the signal-to-noise ratio is higher.

Conclusion

Context stuffing raw event rows into LLM prompts is one of the most common and most expensive mistakes in AI application development. The solution is to shift aggregation work out of the prompt assembly path and into the streaming layer, where it runs once and stays current.

The core pattern is straightforward:

  • Define materialized views in RisingWave that aggregate raw events into compact user, product, or entity summaries
  • Connect your LLM application to RisingWave over the standard PostgreSQL wire protocol
  • At inference time, retrieve one row per entity and format it as a short context snippet
  • Inject that snippet into your system prompt instead of raw tabular data

The views stay accurate because RisingWave maintains them incrementally as new events arrive. Your application sees fresh context with every request, at the cost of a single indexed row lookup.

The economics are significant. Reducing context from thousands of tokens to dozens of tokens per call cuts costs by an order of magnitude or more. At production scale, that difference compounds into meaningful infrastructure savings while also improving response quality by reducing noise in the context window.


Ready to try this in your own environment? The RisingWave quickstart guide gets you running in five minutes. For more on streaming SQL and incremental computation, see how RisingWave materialized views work and the RisingWave blog on real-time data pipelines for AI applications.

Join the RisingWave Slack community to discuss your use case with the team and other AI engineers building on streaming infrastructure.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.