How to Keep RAG Embeddings Fresh Automatically

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Why do RAG embeddings go stale?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RAG embeddings go stale because most pipelines generate them in scheduled batch jobs: nightly, hourly, or on a fixed interval. In between runs, the source documents change but the embeddings do not. When a user query arrives, the retrieval layer returns context that reflects the old version of the document, not the current one. The LLM reasons over that outdated context and produces inaccurate answers."
      }
    },
    {
      "@type": "Question",
      "name": "How often should RAG embeddings be updated?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Embeddings should update within seconds of the source document changing. The right answer is not 'nightly' or 'hourly' but 'as soon as the source changes.' Streaming databases achieve this by maintaining materialized views incrementally: when a document row is updated in the source database, only that row flows through the embedding pipeline and its embedding is regenerated automatically."
      }
    },
    {
      "@type": "Question",
      "name": "Can RisingWave generate embeddings inside a materialized view?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes. RisingWave has a built-in openai_embedding(api_key, model, input) function that calls the OpenAI embeddings API and returns a REAL[] value. You can use this function directly inside a CREATE MATERIALIZED VIEW statement. When a source row changes, RisingWave re-executes the function for that row and stores the updated embedding automatically."
      }
    },
    {
      "@type": "Question",
      "name": "Does RisingWave require Kafka or Debezium for CDC?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "No. RisingWave has native CDC connectors for PostgreSQL and MySQL that read directly from the database transaction log (WAL for PostgreSQL, binlog for MySQL). You do not need Kafka, Debezium, or any middleware. RisingWave connects directly to your source database and captures every INSERT, UPDATE, and DELETE as it happens, which significantly simplifies the architecture."
      }
    },
    {
      "@type": "Question",
      "name": "Where does RisingWave store the embeddings for vector search?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RisingWave stores embeddings natively using its built-in vector(n) type and HNSW index. Vector similarity search happens inside RisingWave using operators like <-> (L2 distance) or the cosine_distance() function. You do not need a separate vector database like Pinecone or Milvus. The retrieval query runs directly against the RisingWave materialized view."
      }
    }
  ]
}

Introduction

Your support bot confidently recommends the StellarPro X200 wireless headphones: sleek specs, great reviews, $189. The customer adds it to their cart and discovers it was discontinued three weeks ago. The product page is gone. The support bot is still selling it.

This is not a prompt engineering failure. The LLM received stale context, reasoned correctly over it, and produced a plausible but wrong answer. The embedding for that product was generated three weeks ago, before the discontinuation. The nightly re-index job has not touched it since.

Knowledge decay is the most silent failure mode in production RAG systems. It is silent because the system keeps responding (fluently, confidently, helpfully) with information that stopped being true days ago.

This guide shows how to eliminate knowledge decay by replacing the scheduled re-index with a streaming materialized view. When a document changes in your source database, its embedding updates automatically within seconds, with no cron jobs and no manual triggers.

The Problem: Knowledge Decay in RAG Systems

A RAG system retrieves context from a vector index and passes it to an LLM. The quality of the answer depends entirely on the freshness of what gets retrieved. The fresher the context, the more accurate the answer.

Most RAG implementations follow a three-step offline pipeline:

Extract documents from the source database
Generate embeddings via an API call
Upsert the embeddings into the vector store

This pipeline runs on a schedule. Nightly is common. Hourly is considered ambitious. But documents do not change on a schedule. They change whenever someone updates them.

What a three-week-old embedding does to your users

Consider a product knowledge base with 50,000 items. Your nightly job generates embeddings for all of them and loads them into your vector store. The next day, the pricing team runs a sale: 600 products get new prices. The operations team discontinues 40 SKUs. The marketing team updates descriptions for 120 items.

That night's batch job picks up those 760 changes, but only after 12 to 24 hours of users querying outdated information. In a support context, that means:

Customers quoted the wrong price
Customers directed to purchase discontinued products
Support agents reading outdated policy language

And you will not know it is happening, because the bot keeps responding.

Why running the batch job more frequently is not the answer

Running the job every hour instead of every night reduces the staleness window from 24 hours to 60 minutes. That helps, but it does not solve the structural problem.

A full re-index regenerates embeddings for the entire corpus on every run. If 760 documents changed out of 50,000, you pay for 50,000 embedding API calls to process 760 changes. At $0.02 per million tokens with a 512-token average chunk, that is roughly $0.50 per run. That seems fine hourly, but the cost scales with corpus size, not change rate. When your corpus grows to 500,000 documents, an hourly full re-index costs $5 per run, $3,600 per month, for a problem that could be solved by processing only the documents that actually changed.

More importantly, if a document is marked discontinued at 9:00 AM, an hourly job running at 9:45 AM still means 45 minutes of users receiving wrong information.

The Streaming RAG Architecture

The alternative is to make the embedding pipeline event-driven rather than schedule-driven. When a row changes in the source database, that change immediately triggers an embedding update for that row only. No full scans, no scheduled jobs, no lag beyond processing time.

The architecture has three stages:

PostgreSQL (source docs) → CDC → RisingWave materialized view (openai_embedding) → vector search inside RisingWave

RisingWave handles all three stages. It reads change events from PostgreSQL via native CDC, maintains a materialized view that calls openai_embedding() on each changed document, and stores the results in its native vector(n) type, which supports similarity search with an HNSW index.

The entire pipeline is SQL. There is no Python embedding script to schedule, no Kafka cluster to maintain, and no vector database to synchronize with.

Building the Pipeline in SQL

Step 1: Configure PostgreSQL for logical replication

RisingWave reads from the PostgreSQL WAL via the logical replication protocol. Enable it:

-- On your PostgreSQL instance
ALTER SYSTEM SET wal_level = logical;
-- Restart PostgreSQL after this change

-- Create a dedicated replication user
CREATE USER rw_cdc_user REPLICATION LOGIN PASSWORD 'your_password';
GRANT SELECT ON public.documents TO rw_cdc_user;

-- Create a publication for the documents table
CREATE PUBLICATION risingwave_pub FOR TABLE public.documents;

Step 2: Create the CDC source in RisingWave

Connect RisingWave to your PostgreSQL instance:

-- Tested against RisingWave v2.8.0
CREATE SOURCE pg_source WITH (
    connector = 'postgres-cdc',
    hostname = 'prod-postgres.internal',
    port = '5432',
    username = 'rw_cdc_user',
    password = 'your_password',
    database.name = 'knowledge_base',
    slot.name = 'rw_embedding_slot',
    publication.name = 'risingwave_pub'
);

Step 3: Create the CDC table

Declare the schema for the documents table, bound to the CDC source:

CREATE TABLE documents (
    doc_id       BIGINT PRIMARY KEY,
    title        VARCHAR,
    content      TEXT,
    category     VARCHAR,
    status       VARCHAR,    -- 'active', 'discontinued', 'draft'
    updated_at   TIMESTAMPTZ
) FROM pg_source TABLE 'public.documents';

RisingWave now receives every INSERT, UPDATE, and DELETE from public.documents in real time. The documents table in RisingWave stays in sync with PostgreSQL continuously.

Step 4: Create the materialized view with automatic embeddings

This is the core of the streaming RAG architecture. The materialized view calls openai_embedding() on each document's content and stores the result as a vector column:

CREATE MATERIALIZED VIEW document_embeddings AS
SELECT
    doc_id,
    title,
    category,
    status,
    content,
    updated_at,
    -- openai_embedding returns REAL[], cast to vector(1536) for text-embedding-3-small
    openai_embedding(
        'sk-your-openai-api-key',
        'text-embedding-3-small',
        -- Combine title and content for richer embeddings
        title || '. ' || content
    )::vector(1536) AS embedding
FROM documents
WHERE status = 'active';  -- Only embed active documents; discontinued rows are excluded automatically

When a row is inserted or updated in documents, RisingWave propagates only that row through the materialized view. The openai_embedding() call fires for the changed row, the new embedding is stored, and the view reflects the current document state within seconds.

When a document's status changes to 'discontinued', the row no longer satisfies the WHERE status = 'active' predicate, so RisingWave removes it from the materialized view. The discontinued product vanishes from the retrieval index automatically.

Step 5: Create an HNSW index for fast similarity search

RisingWave's native vector(n) type supports HNSW indexing for approximate nearest-neighbor search:

-- Create an HNSW index on the embedding column
-- This enables sub-millisecond similarity search at scale
CREATE INDEX document_embeddings_hnsw
    ON document_embeddings
    USING hnsw (embedding vector_cosine_ops);

The HNSW index updates incrementally as new embeddings arrive in the materialized view.

Step 6: Query for similar documents at retrieval time

At query time, your RAG application generates an embedding for the user's question and runs a similarity search directly against the materialized view:

-- Find the 5 most relevant active documents for a user query
-- Replace the vector literal with the embedding from your application
SELECT
    doc_id,
    title,
    category,
    content,
    -- cosine_distance returns 0 (identical) to 2 (opposite)
    cosine_distance(embedding, '[0.021, -0.043, 0.017, ...]'::vector(1536)) AS distance
FROM document_embeddings
WHERE status = 'active'
ORDER BY embedding <-> '[0.021, -0.043, 0.017, ...]'::vector(1536)
LIMIT 5;

The <-> operator computes L2 distance. For cosine similarity, use the cosine_distance() function or create the HNSW index with vector_cosine_ops as shown above and use <=> as the operator.

A complete retrieval query from your application looks like this:

import psycopg2
import openai

def retrieve_context(user_query: str, top_k: int = 5) -> list[dict]:
    """Retrieve fresh document context from RisingWave for a user query."""
    # Generate query embedding via OpenAI
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=user_query
    )
    query_vec = response.data[0].embedding
    vec_str = "[" + ",".join(str(x) for x in query_vec) + "]"

    # Query RisingWave directly using a standard PostgreSQL driver
    conn = psycopg2.connect(
        host="risingwave.internal",
        port=4566,
        dbname="dev",
        user="root"
    )
    cur = conn.cursor()
    cur.execute("""
        SELECT doc_id, title, category, content
        FROM document_embeddings
        ORDER BY embedding <-> %s::vector(1536)
        LIMIT %s
    """, (vec_str, top_k))

    rows = cur.fetchall()
    return [
        {"doc_id": r[0], "title": r[1], "category": r[2], "content": r[3]}
        for r in rows
    ]

Note that only active documents are in the materialized view. A discontinued product is invisible to this query the moment its status changes in PostgreSQL, not after the next scheduled re-index.

Why This Approach Eliminates Knowledge Decay

Incremental processing, not full recomputation

When one document changes in PostgreSQL, one row flows through the RisingWave materialized view. The openai_embedding() call fires once. The HNSW index updates for one entry.

Compare that to a nightly batch job that calls the embedding API 50,000 times to process 760 changes.

Metric	Nightly batch	Streaming materialized view
Embedding API calls per day	50,000	~760 (only changed docs)
Latency from change to index update	Up to 24 hours	Seconds
Discontinued product exposure	Up to 24 hours	Seconds
Code to maintain	Cron job + ETL script + scheduler	One `CREATE MATERIALIZED VIEW`
Scales with corpus size	Yes (cost grows)	No (cost scales with change rate)

No synchronization problem

In a traditional architecture, you have two systems: the source database and the vector store. They drift apart the moment a document changes. Keeping them in sync requires a background job, a change feed, or an application-level hook.

With the streaming materialized view approach, there is only one system: RisingWave. The source data and the embeddings live together. There is no synchronization gap because there is no second system to synchronize.

MCP integration for AI agent access

If you are building AI agents that need to query document embeddings, RisingWave has an official MCP server at risingwavelabs/risingwave-mcp. Agents can query document_embeddings using natural language tool calls that the MCP server translates to SQL, without any custom integration code.

FAQ

Why do RAG embeddings go stale?

RAG embeddings go stale because most pipelines generate them in scheduled batch jobs: nightly, hourly, or on a fixed interval. In between runs, the source documents change but the embeddings do not. When a user query arrives, the retrieval layer returns context that reflects the old version of the document, not the current one. The LLM reasons over that outdated context and produces inaccurate answers.

How often should RAG embeddings be updated?

Embeddings should update within seconds of the source document changing. The right answer is not "nightly" or "hourly" but "as soon as the source changes." Streaming databases achieve this by maintaining materialized views incrementally: when a document row is updated in the source database, only that row flows through the embedding pipeline and its embedding is regenerated automatically.

Can RisingWave generate embeddings inside a materialized view?

Yes. RisingWave has a built-in openai_embedding(api_key, model, input) function that calls the OpenAI embeddings API and returns a REAL[] value. You can use this function directly inside a CREATE MATERIALIZED VIEW statement. When a source row changes, RisingWave re-executes the function for that row and stores the updated embedding automatically.

Does RisingWave require Kafka or Debezium for CDC?

No. RisingWave has native CDC connectors for PostgreSQL and MySQL that read directly from the database transaction log (WAL for PostgreSQL, binlog for MySQL). You do not need Kafka, Debezium, or any middleware. RisingWave connects directly to your source database and captures every INSERT, UPDATE, and DELETE as it happens, which significantly simplifies the architecture.

Where does RisingWave store the embeddings for vector search?

RisingWave stores embeddings natively using its built-in vector(n) type and HNSW index. Vector similarity search happens inside RisingWave using operators like <-> (L2 distance) or the cosine_distance() function. You do not need a separate vector database like Pinecone or Milvus. The retrieval query runs directly against the RisingWave materialized view.

Conclusion

Stale embeddings are a correctness problem masquerading as an infrastructure problem. The root cause is not a bad embedding model or a slow API. It is treating document indexing as a scheduled job instead of a continuous process.

The fix is straightforward: replace the batch re-index with a materialized view.

CDC captures every document change the moment it happens, with no middleware required
openai_embedding() inside the materialized view fires automatically for each changed row
RisingWave's native vector(n) type and HNSW index serve the retrieval queries directly
The WHERE status = 'active' filter removes discontinued documents from the index automatically, with no manual intervention
Cost scales with change rate, not corpus size: a 50,000-document corpus with 1% daily changes incurs 500 API calls per day, not 50,000

The discontinued product your support bot was recommending three weeks after discontinuation disappears from the index within seconds of the status update. No cron job needed.

Ready to try this yourself? Try RisingWave Cloud free, no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other developers building streaming AI applications.