Real-Time Search Indexing with CDC: Debezium to Elasticsearch vs RisingWave
Keeping a search index synchronized with your operational database is one of the most common CDC use cases. The traditional route is Debezium → Kafka → Kafka Connect Elasticsearch Sink. RisingWave offers a shorter path: CDC source → materialized view → Elasticsearch sink — with SQL transformations in the middle.
Why Search Indexes Need CDC
Full re-index jobs that run nightly are no longer acceptable for most search experiences. Users expect results to reflect changes within seconds — a product going out of stock, a document being edited, a customer record being updated.
Polling the source database for changes is expensive and slow. CDC captures every committed change at the transaction log level, giving you sub-second propagation without added load on the primary.
The Classic Pattern: Debezium → Kafka → Elasticsearch
This is the most widely deployed architecture for real-time search indexing.
PostgreSQL (WAL)
│
▼
Debezium (Kafka Connect Source)
│
▼
Kafka Topic: postgres.public.products
│
▼
Kafka Connect Elasticsearch Sink
│
▼
Elasticsearch Index: products
Debezium Source Connector Configuration
{
"name": "postgres-products-source",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium",
"database.password": "secret",
"database.dbname": "shop",
"table.include.list": "public.products",
"plugin.name": "pgoutput",
"slot.name": "debezium_products",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite"
}
}
Kafka Connect Elasticsearch Sink Configuration
{
"name": "elasticsearch-products-sink",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "postgres.public.products",
"connection.url": "http://elasticsearch:9200",
"type.name": "_doc",
"key.ignore": "false",
"schema.ignore": "true",
"behavior.on.null.values": "delete",
"transforms": "castKey",
"transforms.castKey.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKey.spec": "string"
}
}
This gets changes into Elasticsearch in near-real-time. Deletes are handled via tombstone records. Kafka acts as a buffer, providing replay and fan-out to additional consumers.
The RisingWave Pattern: CDC Source → SQL → Elasticsearch Sink
RisingWave embeds the Debezium engine internally. You connect directly to PostgreSQL's WAL without deploying a separate Debezium cluster or Kafka.
Step 1: Create the CDC Source Table
CREATE TABLE products (
id BIGINT PRIMARY KEY,
name VARCHAR,
description VARCHAR,
price NUMERIC(10,2),
category VARCHAR,
in_stock BOOLEAN,
updated_at TIMESTAMPTZ
) WITH (
connector = 'postgres-cdc',
hostname = 'postgres',
port = '5432',
username = 'rwuser',
password = 'secret',
database.name = 'shop',
schema.name = 'public',
table.name = 'products'
);
Step 2: Create a Materialized View for the Search Document
This is where RisingWave diverges from raw row replication. You can reshape the data into exactly the document structure Elasticsearch expects — using SQL.
CREATE MATERIALIZED VIEW products_search_doc AS
SELECT
id,
name,
description,
price,
category,
in_stock,
updated_at,
-- Denormalize a search-friendly price tier
CASE
WHEN price < 25 THEN 'budget'
WHEN price < 100 THEN 'mid-range'
ELSE 'premium'
END AS price_tier,
-- Full-text field combining name and description
name || ' ' || COALESCE(description, '') AS search_text
FROM products
WHERE in_stock = true OR updated_at > NOW() - INTERVAL '7 days';
Step 3: Write to Elasticsearch via Sink
CREATE SINK products_to_elasticsearch
FROM products_search_doc
WITH (
connector = 'elasticsearch',
index = 'products',
url = 'http://elasticsearch:9200',
primary_key = 'id'
);
Every insert, update, or delete on the products table flows through the materialized view and lands in Elasticsearch within seconds — no Kafka, no separate connector cluster.
Handling Deletes
Both patterns handle deletes, but differently.
With Debezium + Kafka Connect, you configure behavior.on.null.values=delete on the sink connector. Debezium emits a tombstone record (null value) for deleted rows, and the Elasticsearch connector issues a DELETE request.
With RisingWave, deletes in the source table are automatically propagated through the materialized view. The Elasticsearch sink issues a DELETE request when a row disappears from the view's result set.
Comparison Table
| Dimension | Debezium + Kafka + ES Connector | RisingWave + ES Sink |
| Infrastructure | Kafka, Schema Registry, Connect cluster | RisingWave only |
| Index transformation | SMTs (limited) or stream processor | Full SQL |
| Multiple ES indexes | Fan-out via separate sink connectors | One sink per materialized view |
| Replay / reprocessing | Kafka retention enables replay | Re-create materialized view |
| Multi-consumer fan-out | Yes — Kafka serves many downstream systems | No native Kafka fan-out |
| Enrichment with joins | Requires Kafka Streams or Flink | Native SQL JOINs |
| Operational complexity | High | Low |
When to Use Debezium + Kafka
Use the Debezium → Kafka → Elasticsearch pattern when:
- Multiple downstream systems consume the same change stream (another team reads the same Kafka topic for audit, another for analytics).
- You need complex routing — different document types going to different indexes based on event content.
- Your organization already operates a Kafka platform and wants consistency.
- You need long-term event replay capability that outlasts any single system's lifecycle.
When RisingWave Is the Better Fit
Use RisingWave when:
- Elasticsearch is the primary or only consumer of the CDC stream.
- You need SQL-level transformations before indexing — joining with other tables, computing derived fields, filtering rows.
- You want to reduce infrastructure footprint — fewer moving parts, fewer deployment units.
- The team is more comfortable with SQL than with Kafka connector configuration.
Hybrid Architecture
These approaches are not mutually exclusive. Many teams run both:
- Debezium publishes changes to Kafka for microservices and audit systems.
- RisingWave reads from Kafka (or directly from PostgreSQL) and maintains materialized views that feed Elasticsearch.
This gives you the fan-out benefits of Kafka without burdening Elasticsearch with raw Debezium event shapes.
FAQ
Q: Does RisingWave's Elasticsearch sink support index aliases and routing?
Yes. You can configure the index parameter to point to an alias, and routing keys can be derived from materialized view columns. Check the RisingWave sink documentation for the full parameter list.
Q: What happens if Elasticsearch is temporarily unavailable? With Debezium + Kafka, messages accumulate in Kafka and are delivered when Elasticsearch recovers. With RisingWave, the sink will retry and buffer internally, but Kafka provides stronger durability guarantees for this scenario if Elasticsearch outages are frequent.
Q: Can RisingWave write to multiple Elasticsearch indexes? Yes — create one sink per materialized view. Each view can filter or transform data differently and target a different index. There is no single-connector fan-out like Kafka, but the SQL model makes multi-index writes straightforward.
Q: How does schema evolution work?
With Debezium + Schema Registry, schema changes are tracked automatically. With RisingWave, you ALTER TABLE or recreate the source table definition; the materialized view adjusts on the next refresh cycle. For adding new columns, RisingWave handles this gracefully.
Q: Is there latency overhead from the materialized view layer? Materialized views in RisingWave are incrementally maintained — they do not re-execute from scratch on every change. Latency from WAL capture to Elasticsearch is typically under one second under normal load.

