AI Infrastructure

Reduce LLM Token Costs
with Pre-Computed Context

Q: How do you reduce LLM token costs with pre-computed data?

Reduce LLM token costs by pre-computing summaries, aggregations, and entity states using a streaming database like RisingWave. Instead of sending raw event logs or large data dumps to the LLM, inject compact, pre-computed context that contains the same information in far fewer tokens.

Q: What is the biggest hidden cost of LLM-powered applications?

The biggest hidden cost is sending raw, verbose data in prompts. Raw event logs, unstructured customer histories, and full database dumps contain the same information as a concise summary but use 10-100x more tokens. Token costs scale linearly with input size.

Q: How can a streaming database help reduce AI inference costs?

A streaming database like RisingWave continuously maintains pre-computed summaries, aggregations, and entity profiles as materialized views. When an LLM request arrives, you query the materialized view for a compact snapshot instead of aggregating raw data at query time, reducing both tokens and latency.

Q: What data should I pre-compute before sending to an LLM?

Pre-compute: user behavior summaries (last 30-day activity instead of raw event logs), entity states (current cart contents, account status), aggregated metrics (session counts, purchase frequency), and semantic embeddings of recent interactions.

Q: Does pre-computing context affect LLM response quality?

Properly pre-computed context improves response quality because the LLM receives clean, structured facts instead of noisy raw data. The key is designing summaries that preserve the semantically relevant signals and discard irrelevant verbosity.

Stop sending raw data logs to your LLM. Pre-compute summaries, entity states, and aggregations with RisingWave — inject compact facts and cut token spend by up to 90%.

Start Free Read the Docs

Up to 90%

Fewer input tokens

Replace verbose logs with compact pre-computed summaries

Real-time

Always-fresh context

Materialized views update continuously as events arrive

SQL

No custom pipelines

Define pre-computation logic in standard SQL, not code

< 10ms

Context retrieval

Query pre-computed views at Postgres speed, no aggregation at runtime

Raw Data in Prompts Is Expensive

Every token you send to an LLM costs money. Most teams pay 10x more than they need to because they pass raw logs, unstructured histories, and full table dumps into prompts.

Before

Raw event log in prompt

System: You are a customer assistant.
User history:
2026-03-01 09:12 - viewed product #4821
2026-03-01 09:13 - viewed product #4821
2026-03-01 09:14 - added to cart #4821
2026-03-01 09:18 - removed from cart
2026-03-01 09:20 - searched "noise cancel"
2026-03-01 09:22 - viewed product #5102
... (200 more lines)

User: What should I buy?

~2,400 tokensat $0.003/1K = $0.0072 per call

After

Pre-computed context

System: You are a customer assistant.
User profile (pre-computed):
- Top interest: noise-canceling headphones
- Viewed #5102 (Sony WH-1000XM5) 3x today
- Cart: empty (removed $299 item 2h ago)
- Purchase history: 2 audio products
- Price sensitivity: mid-range ($150-$350)

User: What should I buy?

~120 tokensat $0.003/1K = $0.00036 per call

Same information. 20x fewer tokens. Better LLM response because the signal-to-noise ratio is higher.

How RisingWave Pre-Computes LLM Context

RisingWave maintains materialized views that summarize user behavior, entity state, and aggregated signals in real time. Your application queries a view instead of aggregating raw data at prompt-build time.

Ingest Events

Stream clickstream, transactions, and interaction events from Kafka into RisingWave in real time.

Maintain Summaries

SQL materialized views continuously update user profiles, entity states, and aggregated metrics as new events arrive.

Serve Compact Context

At prompt-build time, query the materialized view for a single compact row per user or entity. Inject it into the LLM prompt.

Pre-Computation Patterns

Each pattern replaces a class of verbose raw data with a compact SQL-defined summary.

User Behavior Summary

Replace 200-line clickstream logs with a per-user interest and intent profile updated every second.

CREATE MATERIALIZED VIEW user_profile AS
SELECT user_id,
  MODE() WITHIN GROUP (ORDER BY category) AS top_interest,
  COUNT(*) FILTER (WHERE action='view') AS views_today,
  MAX(price) FILTER (WHERE action='add_cart') AS max_cart_price,
  LAST_VALUE(product_id) AS last_viewed
FROM events
WHERE event_time > NOW() - INTERVAL '24' HOUR
GROUP BY user_id;

Entity State Snapshot

Maintain current state of accounts, orders, or sessions instead of re-aggregating from raw events every time.

CREATE MATERIALIZED VIEW account_state AS
SELECT account_id,
  SUM(amount) AS balance,
  COUNT(*) AS tx_count_30d,
  MAX(created_at) AS last_activity,
  BOOL_OR(flagged) AS has_suspicious_activity
FROM transactions
WHERE created_at > NOW() - INTERVAL '30' DAY
GROUP BY account_id;

Session Context

Summarize the current session in real time so the LLM receives only what happened in the last few minutes.

CREATE MATERIALIZED VIEW session_summary AS
SELECT session_id, user_id,
  array_agg(query ORDER BY ts) AS recent_queries,
  COUNT(DISTINCT page) AS pages_visited,
  EXTRACT(EPOCH FROM (MAX(ts) - MIN(ts))) AS duration_sec
FROM page_events
GROUP BY session_id, user_id,
  SESSION(ts, INTERVAL '30' MINUTE);

Aggregated Signals for RAG

Pre-compute document frequency, recency scores, and popularity signals to rank retrieval candidates before LLM augmentation.

CREATE MATERIALIZED VIEW doc_signals AS
SELECT doc_id,
  COUNT(*) AS views_7d,
  AVG(rating) AS avg_rating,
  MAX(viewed_at) AS last_viewed,
  SUM(CASE WHEN clicked THEN 1 ELSE 0 END)::float
    / NULLIF(COUNT(*),0) AS ctr
FROM doc_events
WHERE viewed_at > NOW() - INTERVAL '7' DAY
GROUP BY doc_id;

Token Cost Impact

Estimated savings across common LLM application patterns using pre-computed context.

Application Pattern	Raw Data Tokens	Pre-Computed Tokens	Token Reduction
E-commerce recommendation	2,400	120	95%
Customer support assistant	3,200	250	92%
Financial fraud explanation	1,800	200	89%
RAG document ranking	5,000	400	92%
IoT anomaly explanation	4,500	150	97%
Personalized content feed	2,000	180	91%

Estimates based on typical application prompt structures. Actual savings depend on data verbosity and summary design.

Frequently Asked Questions

How do you reduce LLM token costs with pre-computed data?

Pre-compute summaries, aggregations, and entity states using a streaming database like RisingWave. Instead of sending raw event logs into prompts, inject compact pre-computed context with the same information in far fewer tokens.

What is the biggest hidden cost in LLM-powered applications?

Sending raw, verbose data in prompts. Raw event logs, unstructured customer histories, and full database dumps contain the same information as a concise summary but use 10-100x more tokens. Token costs scale linearly with input size.

How can a streaming database help reduce AI inference costs?

A streaming database continuously maintains pre-computed summaries as materialized views. When an LLM request arrives, you query the view for a compact snapshot in under 10ms instead of aggregating raw data at runtime.

What data should I pre-compute before sending to an LLM?