How to Build a Real-Time Knowledge Base for AI
A real-time knowledge base continuously updates its content as source data changes — ensuring AI agents and RAG systems always retrieve current information. Unlike batch-refreshed knowledge bases that go stale between updates, a streaming knowledge base powered by RisingWave stays fresh via CDC and streaming materialized views.
| Approach | Freshness | Complexity | Use Case |
| Batch KB (nightly refresh) | Hours | Low | Static documentation |
| Incremental KB (scheduled) | Minutes | Medium | Moderately changing data |
| Streaming KB (continuous) | Sub-second | Medium | Policies, pricing, inventory |
Architecture
Source DBs → CDC → RisingWave → Materialized Views (structured KB)
→ Vector Index (semantic search)
↓
AI Agents query via PG / MCP
Implementation
-- Real-time article index from CMS database
CREATE TABLE articles (id INT PRIMARY KEY, title VARCHAR, content TEXT, category VARCHAR, updated_at TIMESTAMP)
FROM cms_cdc_source TABLE 'public.articles';
-- Always-current knowledge base
CREATE MATERIALIZED VIEW knowledge_base AS
SELECT id, title, category, SUBSTRING(content, 1, 500) as summary, updated_at
FROM articles WHERE status = 'published';
Frequently Asked Questions
How is a real-time knowledge base different from RAG?
RAG retrieves from a pre-built index (often stale). A real-time knowledge base continuously updates its index via streaming, ensuring retrieval always returns current information. It's the foundation for accurate RAG.
Do I need a vector database for a real-time knowledge base?
Not always. For structured queries (lookup by category, keyword match), streaming SQL views are faster and more precise. For semantic similarity search, add vector embeddings. RisingWave supports both.

