AI Infrastructure

Reduce LLM Token Costs
with Pre-Computed Context

Stop sending raw data logs to your LLM. Pre-compute summaries, entity states, and aggregations with RisingWave — inject compact facts and cut token spend by up to 90%.

Up to 90%
Fewer input tokens

Replace verbose logs with compact pre-computed summaries

Real-time
Always-fresh context

Materialized views update continuously as events arrive

SQL
No custom pipelines

Define pre-computation logic in standard SQL, not code

< 10ms
Context retrieval

Query pre-computed views at Postgres speed, no aggregation at runtime

Raw Data in Prompts Is Expensive

Every token you send to an LLM costs money. Most teams pay 10x more than they need to because they pass raw logs, unstructured histories, and full table dumps into prompts.

Before
Raw event log in prompt
System: You are a customer assistant.
User history:
2026-03-01 09:12 - viewed product #4821
2026-03-01 09:13 - viewed product #4821
2026-03-01 09:14 - added to cart #4821
2026-03-01 09:18 - removed from cart
2026-03-01 09:20 - searched "noise cancel"
2026-03-01 09:22 - viewed product #5102
... (200 more lines)

User: What should I buy?
~2,400 tokensat $0.003/1K = $0.0072 per call
After
Pre-computed context
System: You are a customer assistant.
User profile (pre-computed):
- Top interest: noise-canceling headphones
- Viewed #5102 (Sony WH-1000XM5) 3x today
- Cart: empty (removed $299 item 2h ago)
- Purchase history: 2 audio products
- Price sensitivity: mid-range ($150-$350)

User: What should I buy?
~120 tokensat $0.003/1K = $0.00036 per call

Same information. 20x fewer tokens. Better LLM response because the signal-to-noise ratio is higher.

How RisingWave Pre-Computes LLM Context

RisingWave maintains materialized views that summarize user behavior, entity state, and aggregated signals in real time. Your application queries a view instead of aggregating raw data at prompt-build time.

1

Ingest Events

Stream clickstream, transactions, and interaction events from Kafka into RisingWave in real time.

2

Maintain Summaries

SQL materialized views continuously update user profiles, entity states, and aggregated metrics as new events arrive.

3

Serve Compact Context

At prompt-build time, query the materialized view for a single compact row per user or entity. Inject it into the LLM prompt.

Pre-Computation Patterns

Each pattern replaces a class of verbose raw data with a compact SQL-defined summary.

User Behavior Summary

Replace 200-line clickstream logs with a per-user interest and intent profile updated every second.

CREATE MATERIALIZED VIEW user_profile AS
SELECT user_id,
  MODE() WITHIN GROUP (ORDER BY category) AS top_interest,
  COUNT(*) FILTER (WHERE action='view') AS views_today,
  MAX(price) FILTER (WHERE action='add_cart') AS max_cart_price,
  LAST_VALUE(product_id) AS last_viewed
FROM events
WHERE event_time > NOW() - INTERVAL '24' HOUR
GROUP BY user_id;

Entity State Snapshot

Maintain current state of accounts, orders, or sessions instead of re-aggregating from raw events every time.

CREATE MATERIALIZED VIEW account_state AS
SELECT account_id,
  SUM(amount) AS balance,
  COUNT(*) AS tx_count_30d,
  MAX(created_at) AS last_activity,
  BOOL_OR(flagged) AS has_suspicious_activity
FROM transactions
WHERE created_at > NOW() - INTERVAL '30' DAY
GROUP BY account_id;

Session Context

Summarize the current session in real time so the LLM receives only what happened in the last few minutes.

CREATE MATERIALIZED VIEW session_summary AS
SELECT session_id, user_id,
  array_agg(query ORDER BY ts) AS recent_queries,
  COUNT(DISTINCT page) AS pages_visited,
  EXTRACT(EPOCH FROM (MAX(ts) - MIN(ts))) AS duration_sec
FROM page_events
GROUP BY session_id, user_id,
  SESSION(ts, INTERVAL '30' MINUTE);

Aggregated Signals for RAG

Pre-compute document frequency, recency scores, and popularity signals to rank retrieval candidates before LLM augmentation.

CREATE MATERIALIZED VIEW doc_signals AS
SELECT doc_id,
  COUNT(*) AS views_7d,
  AVG(rating) AS avg_rating,
  MAX(viewed_at) AS last_viewed,
  SUM(CASE WHEN clicked THEN 1 ELSE 0 END)::float
    / NULLIF(COUNT(*),0) AS ctr
FROM doc_events
WHERE viewed_at > NOW() - INTERVAL '7' DAY
GROUP BY doc_id;

Token Cost Impact

Estimated savings across common LLM application patterns using pre-computed context.

Application PatternRaw Data TokensPre-Computed TokensToken Reduction
E-commerce recommendation2,40012095%
Customer support assistant3,20025092%
Financial fraud explanation1,80020089%
RAG document ranking5,00040092%
IoT anomaly explanation4,50015097%
Personalized content feed2,00018091%

Estimates based on typical application prompt structures. Actual savings depend on data verbosity and summary design.

Frequently Asked Questions

How do you reduce LLM token costs with pre-computed data?

Pre-compute summaries, aggregations, and entity states using a streaming database like RisingWave. Instead of sending raw event logs into prompts, inject compact pre-computed context with the same information in far fewer tokens.

What is the biggest hidden cost in LLM-powered applications?

Sending raw, verbose data in prompts. Raw event logs, unstructured customer histories, and full database dumps contain the same information as a concise summary but use 10-100x more tokens. Token costs scale linearly with input size.

How can a streaming database help reduce AI inference costs?

A streaming database continuously maintains pre-computed summaries as materialized views. When an LLM request arrives, you query the view for a compact snapshot in under 10ms instead of aggregating raw data at runtime.

What data should I pre-compute before sending to an LLM?

Pre-compute: user behavior summaries (last 30-day activity instead of raw event logs), entity states (current cart contents, account status), aggregated metrics (session counts, purchase frequency), and semantic embeddings of recent interactions.

Does pre-computing context affect LLM response quality?

Properly pre-computed context improves response quality. The LLM receives clean, structured facts instead of noisy raw data. The key is designing summaries that preserve semantically relevant signals and discard irrelevant verbosity.

Cut Your AI Inference Bill with Smarter Context

RisingWave is open source and free to use. Define a materialized view over your event stream and start serving compact, always-fresh context to your LLM in minutes.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.