Streaming Embeddings: Keeping Vector Stores Fresh in Real Time

Vector stores in RAG systems go stale when source documents change but embeddings aren't re-computed. Streaming embeddings solve this by re-embedding documents as they change, using CDC to detect changes and a streaming pipeline to update vectors continuously. This eliminates the staleness problem that causes RAG hallucinations.

Approach	Embedding Freshness	Latency	Complexity
Batch re-embed (nightly)	Hours	High (full re-embed)	Low
Incremental re-embed (triggered)	Minutes	Medium	Medium
Streaming re-embed (CDC-driven)	Seconds	Low (only changes)	Medium

How Streaming Embeddings Work

Source DB → CDC → Streaming Processor → Embedding API → Vector Store
                                        (OpenAI, etc.)

When a document changes in the source database, CDC captures the change. The streaming processor sends the changed content to an embedding API and updates the vector store. Only changed documents are re-embedded.

Implementation with RisingWave

-- CDC from document database
CREATE TABLE documents (id INT PRIMARY KEY, content TEXT, updated_at TIMESTAMP)
FROM pg_cdc_source TABLE 'public.documents';

-- Track which documents changed (for external embedding pipeline)
CREATE MATERIALIZED VIEW docs_to_embed AS
SELECT id, content, updated_at FROM documents
WHERE updated_at > NOW() - INTERVAL '5 minutes';

An external service polls docs_to_embed, re-embeds changed documents, and updates the vector store.

Frequently Asked Questions

How often should I re-embed documents?

With streaming embeddings, re-embed on every change. CDC captures changes as they happen, so only modified documents are re-embedded — not the entire corpus.

Can RisingWave generate embeddings?

RisingWave doesn't generate embeddings directly, but it can trigger embedding pipelines via UDFs (Python) or by tracking changed documents for external embedding services to process.