SQL That Thinks: Real-Time Embeddings with RisingWave and OpenAI

Most embedding pipelines are quietly stale. The model is fresh, the vector index is fast, but the embeddings themselves were written hours ago by a batch job that ran at 3 a.m. By the time a user types a question, the support ticket they filed two minutes ago is invisible to the retrieval system. The product feels intelligent and dated at the same time.

We have been thinking about what changes when the database itself can call OpenAI. Not as an offline ETL step. Not as a separate Python service that polls a queue. As a SQL function inside an incrementally maintained materialized view. The row arrives, the embedding is computed, the vector lands in the index, and a query running a second later already sees it. There is no pipeline, because the pipeline is a CREATE MATERIALIZED VIEW.

This post walks through that shift. We will show the architecture, the SQL that makes it work in RisingWave 2.8, and what becomes possible once embeddings stop being a batch artifact.

What embeddings actually buy you

An embedding is a way to teach a database to find things by meaning instead of by string match. Two customer messages that share no keywords ("my card got declined" and "payment failed at checkout") collapse to vectors that sit near each other in 1,536-dimensional space. Cosine distance becomes a stand-in for "are these about the same thing." Everything downstream that needs semantic recall, recommendations, retrieval-augmented generation, deduplication, intent routing, leans on this trick.

The mechanics are simple. You feed text to an embedding model, you get back a fixed-length vector of floats, you store it somewhere that can do nearest-neighbor search. The model is the part that has gotten dramatically better. The "store it somewhere" part is where most architectures are still living in 2018.

The staleness problem nobody talks about

Walk through how a typical RAG system gets built. Someone writes a script that reads documents from a source, calls openai.embeddings.create, writes the resulting vectors into Pinecone or pgvector. They schedule it. It runs nightly. Some teams get fancier and use a queue: Kafka publishes new documents, a Python consumer batches them, calls the OpenAI API, writes to the vector store. The consumer is a separate service. It has its own deploy, its own retry logic, its own dead-letter queue, its own dashboard that nobody looks at.

This works. It is also where the staleness comes from. Every additional hop between "new data exists" and "new embedding is queryable" is a place where latency accumulates and where things break silently. We have watched teams discover that their consumer was three hours behind and had been for a week, because the alerting threshold was set when the volume was lower.

The deeper issue is that the embedding logic lives outside the database. The database knows the row exists. The database knows when it changed. But the database has no opinion on whether the embedding is current, because the embedding is something a different system is supposed to take care of. That separation is the bug.

How RisingWave changes this

RisingWave is a Postgres-compatible streaming database. Its central idea is that materialized views are incrementally maintained: when a row arrives in a base table, every view that depends on that row updates immediately, with the same correctness guarantees you would get from a full recomputation but at the cost of a small delta.

In version 2.8, RisingWave ships an openai_embedding function as a first-class SQL primitive. It takes a JSONB config and a text input and calls OpenAI's embeddings API:

SELECT openai_embedding(
  '{"api_key": "sk-...", "model": "text-embedding-3-small"}'::jsonb,
  'my password reset email never arrived'
);

The return value is a 1,536-element real[] (for text-embedding-3-small) that you can cast to vector(1536). We tested this on a clean RisingWave 2.8.0 instance against the live OpenAI API; it returns in roughly a second per call.

The interesting part is not the function itself. It is what happens when you put it inside a materialized view.

The architecture, in one diagram

   ┌──────────────┐     ┌───────────────────────────┐     ┌──────────────┐
   │   Postgres   │     │        RisingWave         │     │  Application │
   │   / Kafka    │ ──> │                           │ ──> │              │
   │  / app rows  │ CDC │  ┌────────────────────┐   │ PG  │  similarity  │
   └──────────────┘     │  │ base table         │   │     │  search via  │
                        │  │ (raw text)         │   │     │  SELECT      │
                        │  └─────────┬──────────┘   │     │              │
                        │            │              │     └──────────────┘
                        │            │ incremental  │
                        │            ▼              │
                        │  ┌────────────────────┐   │
                        │  │ MATERIALIZED VIEW  │   │
                        │  │ openai_embedding() │   │
                        │  │ vector(1536)       │   │
                        │  └────────────────────┘   │
                        │            │              │
                        │            ▼              │
                        │  ┌────────────────────┐   │
                        │  │ HNSW index         │   │
                        │  │ (append-only tbl)  │   │
                        │  └────────────────────┘   │
                        └───────────────────────────┘

Data arrives from Postgres CDC, a Kafka topic, or a direct INSERT. A materialized view computes the embedding the moment the row lands. A second view (or the same one, projected onto an append-only table) feeds an HNSW index for fast nearest-neighbor lookups. The application queries everything over the Postgres wire protocol. There is no embedding service to operate. There is no batch job to monitor. There is one database.

The SQL

We will build a working version against a support ticket stream. Everything below was run on RisingWave 2.8.0.

Step 1: a base table and a UDF wrapper

We hide the API key in a SQL UDF so the embedding call reads cleanly elsewhere.

CREATE TABLE support_tickets (
  ticket_id BIGINT PRIMARY KEY,
  subject   TEXT,
  body      TEXT,
  created_at TIMESTAMPTZ
);

CREATE FUNCTION embed(t VARCHAR) RETURNS real[] LANGUAGE sql AS $$
  SELECT openai_embedding(
    '{"api_key": "sk-...", "model": "text-embedding-3-small"}'::jsonb,
    t
  )
$$;

In production you would pull the API key from a secret manager and inject it at deploy time. For the post we are inlining it.

Step 2: a materialized view that embeds every row

CREATE MATERIALIZED VIEW ticket_embeddings AS
SELECT
  ticket_id,
  subject,
  embed(subject || ' ' || body)::vector(1536) AS embedding,
  created_at
FROM support_tickets;

That is the entire pipeline. When a row is inserted into support_tickets, RisingWave incrementally evaluates this view, calls OpenAI for that row only, and writes the resulting vector. It does not recompute embeddings for rows that have not changed. It does not block writers on the embedding API call.

Step 3: similarity search

The <=> operator computes cosine distance between two vectors. To find tickets similar to a freeform query, we embed the query on the fly:

WITH q AS (
  SELECT embed('password reset not working')::vector(1536) AS qv
)
SELECT ticket_id, subject, embedding <=> q.qv AS cos_dist
FROM ticket_embeddings, q
ORDER BY cos_dist
LIMIT 3;

On our test data, this returns:

 ticket_id |   subject    |      cos_dist
-----------+--------------+---------------------
         4 | Login broken | 0.37750204852353675
         1 | Cannot login | 0.42122274397606296
         3 | API timeout  | 0.69365826986316

Ticket 4 was inserted thirty seconds before this query ran. It is already in the view, already embedded, already ranked correctly. There was no separate process that needed to wake up.

Step 4: HNSW for scale

Linear cosine scans are fine up to a few hundred thousand vectors. Past that, we want an index. RisingWave supports HNSW indexes on append-only tables:

CREATE INDEX idx_ticket_embeddings_hnsw
ON ticket_embeddings_append_only
USING hnsw (embedding)
WITH (distance_type = 'cosine');

The append-only constraint is a real one. HNSW does not support updates or deletes cleanly in any system; RisingWave is honest about this rather than pretending otherwise. The usual pattern is to keep the canonical embedding view and project it into an append-only table for indexed lookup.

What this actually enables

The change from batch to streaming embeddings sounds incremental, but the second-order effects are larger than they look.

Same-session retrieval. A user files a support ticket. Two seconds later they ask the assistant a question about it. With a batch pipeline, the assistant cannot find the ticket; it has not been embedded yet. With this architecture, the ticket is in the index before the user finishes their next sentence. The system stops feeling like it has memory loss.

Joinable embeddings. Because the embeddings live in the same database as the original rows, you can join them against anything else in your warehouse. "Find tickets similar to this query, filter to enterprise customers, group by account manager." That query is one SELECT against materialized views. With an external vector store you would need to fetch candidate IDs, round-trip them to Postgres, and reconstruct the join in application code.

Recomputable history. If you swap embedding models, you do not need to write a backfill script. You change the model in the UDF, drop and recreate the materialized view, and RisingWave re-embeds everything against the new model. The pipeline is the definition; the definition is the pipeline.

No glue code. The thing that should be obvious but rarely is: a Python service that consumes from Kafka, calls OpenAI, writes to Pinecone, and handles retries is a maintenance burden. It is also code that nobody wants to be on call for. Folding it into a materialized view does not just remove latency; it removes a system.

Where this is heading

We think the right way to read what RisingWave is doing here is as part of a longer trend, where the boundary between "data system" and "AI infrastructure" stops making sense. The vector store, the embedding service, the feature store, the cache; these were all separate systems because the databases of the 2010s could not host them. Once a database can do incremental view maintenance over external function calls, the case for keeping them separate gets thinner every year.

The interesting question is not "can my database call OpenAI." It is: what does an application look like when every embedding, every classification, every LLM-derived feature is just a column in a view that updates as data arrives. We do not have the full answer yet. But the SQL is short, the latency is real, and the architecture diagram fits on one page. That is usually how things start.