Building an Event-Driven Architecture with CDC: Debezium vs RisingWave

CDC turns every database commit into an event. For event-driven architectures where multiple microservices react to those events, Debezium publishing to Kafka is the right foundation. When those same events need to power analytics, dashboards, and materialized aggregates, RisingWave fits better. Most mature systems use both — and they complement each other well.

CDC as the Backbone of Event-Driven Design

The transactional outbox pattern and polling-based event publishing are fragile. They either require application code changes or introduce polling lag and missed updates.

CDC solves this at the infrastructure level. Every INSERT, UPDATE, and DELETE on a watched table becomes a durable event stream. Services that need to react to order changes, inventory movements, or user state transitions can subscribe to that stream without touching application code.

Pattern 1: Debezium for Event Publishing to Microservices

When multiple microservices need to consume database change events, Debezium → Kafka is the correct tool. Kafka's consumer group model allows each service to maintain its own offset and consume at its own pace.

PostgreSQL (WAL)
     │
     ▼
Debezium (Kafka Connect)
     │
     ▼
Kafka Topics
  ├── cdc.orders        → Order Service, Fulfillment Service
  ├── cdc.inventory     → Inventory Service, Purchasing Service
  └── cdc.customers     → CRM Service, Marketing Service

Debezium Configuration for Multi-Topic Event Publishing

{
  "name": "postgres-eda-source",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.dbname": "commerce",
    "table.include.list": "public.orders,public.inventory,public.customers",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_eda",
    "topic.prefix": "cdc",
    "transforms": "route,unwrap",
    "transforms.route.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.add.fields": "op,ts_ms,source.ts_ms",
    "transforms.unwrap.add.headers": "op"
  }
}

Downstream microservices subscribe to specific topics. The op field (c for create, u for update, d for delete) lets each service route event handling logic accordingly.

Consumer Service Example (Java/Spring)

@KafkaListener(topics = "cdc.orders", groupId = "fulfillment-service")
public void handleOrderEvent(OrderEvent event) {
    if ("c".equals(event.getOp())) {
        fulfillmentService.createShipment(event.getAfter());
    } else if ("u".equals(event.getOp())) {
        String newStatus = event.getAfter().getStatus();
        if ("cancelled".equals(newStatus)) {
            fulfillmentService.cancelShipment(event.getAfter().getId());
        }
    }
}

This pattern scales to dozens of microservices. Each service owns its consumer group, processes events independently, and can replay from any offset in Kafka.

Pattern 2: RisingWave for Analytical Event Processing

Microservices don't need aggregated metrics — they react to individual events. But dashboards, fraud detection, and operational analytics do need aggregates. That's where RisingWave fits.

RisingWave can consume the same CDC events from two sources:

Directly from PostgreSQL WAL (bypassing Kafka entirely for simpler setups)
From the Kafka topics that Debezium already publishes

Option A: RisingWave Reading Directly from PostgreSQL

-- Orders CDC source
CREATE TABLE orders (
    id          BIGINT PRIMARY KEY,
    customer_id BIGINT,
    status      VARCHAR,
    total       NUMERIC(10,2),
    region      VARCHAR,
    created_at  TIMESTAMPTZ
) WITH (
    connector = 'postgres-cdc',
    hostname  = 'postgres',
    port      = '5432',
    username  = 'rwuser',
    password  = 'secret',
    database.name = 'commerce',
    schema.name   = 'public',
    table.name    = 'orders'
);

Option B: RisingWave Reading from Debezium Kafka Topics

CREATE TABLE orders (
    id          BIGINT,
    customer_id BIGINT,
    status      VARCHAR,
    total       NUMERIC(10,2),
    region      VARCHAR,
    created_at  TIMESTAMPTZ,
    PRIMARY KEY (id)
) WITH (
    connector = 'kafka',
    topic     = 'cdc.orders',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT DEBEZIUM ENCODE JSON;

Both approaches land the same data. Option B is preferable when Debezium is already running for microservices — you avoid opening a second WAL replication slot.

Materialized Views for Real-Time Analytics

Once orders flow into RisingWave, you build materialized views that stay continuously up-to-date:

-- Revenue by region, updated in real time
CREATE MATERIALIZED VIEW revenue_by_region AS
SELECT
    region,
    COUNT(*)                                          AS order_count,
    SUM(total)                                        AS total_revenue,
    AVG(total)                                        AS avg_order_value,
    COUNT(*) FILTER (WHERE status = 'cancelled')      AS cancelled_orders,
    MAX(created_at)                                   AS last_order_at
FROM orders
WHERE status != 'draft'
GROUP BY region;

-- Hourly order volume trend
CREATE MATERIALIZED VIEW hourly_order_volume AS
SELECT
    DATE_TRUNC('hour', created_at) AS hour,
    region,
    COUNT(*)                       AS orders,
    SUM(total)                     AS revenue
FROM orders
GROUP BY DATE_TRUNC('hour', created_at), region;

-- High-value order alert stream
CREATE MATERIALIZED VIEW high_value_orders AS
SELECT id, customer_id, total, region, created_at
FROM orders
WHERE total > 5000 AND status = 'confirmed';

Query these views like ordinary tables. Every query returns current results — no batch jobs, no stale caches.

The Hybrid Architecture: Both Tools Working Together

The most robust event-driven architectures use Debezium and RisingWave together:

PostgreSQL (WAL)
     │
     ├── Debezium (Kafka Connect)
     │        │
     │        ▼
     │   Kafka Topics
     │     ├── Microservice A (fulfillment)
     │     ├── Microservice B (inventory)
     │     ├── Microservice C (notifications)
     │     └── RisingWave (reads Debezium topics)
     │                │
     │                ▼
     │         Materialized Views
     │           ├── Dashboards
     │           ├── Fraud Alerts → Kafka sink
     │           └── Iceberg Sink → Data Lakehouse
     │
     └── RisingWave (direct CDC, if no Kafka in use)

Debezium handles the fan-out to microservices. RisingWave handles the aggregation and analytics layer. There is no resource conflict — they use separate replication slots, or RisingWave reads from Kafka (consuming Debezium's output).

Routing Events: Debezium SMTs vs RisingWave SQL

Event routing — sending different types of changes to different destinations — works differently in each tool.

With Debezium, you use Single Message Transforms (SMTs) to route, filter, and reshape events before they reach Kafka. SMTs are chainable but limited — complex logic requires a stream processor like Flink downstream.

With RisingWave, routing is pure SQL. Write different materialized views with different WHERE clauses and different sinks:

-- Route high-value orders to a dedicated Kafka topic
CREATE SINK high_value_order_alerts
FROM high_value_orders
WITH (
    connector = 'kafka',
    topic     = 'alerts.high_value_orders',
    properties.bootstrap.server = 'kafka:9092'
);

-- Route cancelled orders to a separate notification stream
CREATE SINK cancelled_order_events
FROM (
    SELECT id, customer_id, total, created_at
    FROM orders WHERE status = 'cancelled'
)
WITH (
    connector = 'kafka',
    topic     = 'events.order_cancelled',
    properties.bootstrap.server = 'kafka:9092'
);

Comparison Table

Concern	Debezium + Kafka	RisingWave
Multi-service fan-out	Native (consumer groups)	Not designed for this
Aggregated metrics	Requires Flink or ksqlDB	Native materialized views
Event replay	Kafka retention	Recreate view from CDC source
Complex routing	SMTs (limited) or Flink	SQL WHERE / multiple sinks
Dashboard queries	Requires OLAP store	Query materialized views directly
Operational overhead	High (Kafka + Connect + optional Flink)	Low
Best for	Event distribution to services	Analytics on change streams

When to Use Each

Use Debezium → Kafka when:

Multiple independent services need to consume the same change stream.
Services need to process at different speeds or replay from arbitrary points.
You need durable event storage that outlives any single downstream system.

Use RisingWave when:

You need aggregations, counts, sums, and windowed analytics over the change stream.
You want to power dashboards or APIs directly from live CDC data.
You want fewer infrastructure components for analytics-only pipelines.

Use both when:

Microservices need real-time events AND the business needs operational analytics from the same data.
This is the most common production pattern for teams at scale.

FAQ

Q: If I already run Debezium for microservices, do I need a second replication slot for RisingWave? Not necessarily. Configure RisingWave to read from the Kafka topics Debezium already publishes using FORMAT DEBEZIUM ENCODE JSON. This avoids a second slot and reduces WAL load on PostgreSQL.

Q: Can RisingWave publish events back to Kafka for microservices? Yes. RisingWave has a Kafka sink. You can build materialized views that compute alerts or derived events, then publish them to Kafka topics where microservices subscribe.

Q: Is there a risk of data divergence between what Debezium sends to Kafka and what RisingWave computes? If RisingWave reads from the same Kafka topics as microservices, it sees exactly the same events. If RisingWave uses a direct CDC connection, there is a brief lag between the two slots, but both reflect the same committed data — just at slightly different offsets.

Q: How does RisingWave handle event ordering for aggregations? RisingWave processes events in commit order from the WAL. For aggregations that depend on event sequence (like running totals), this ordering guarantee is preserved within a single table's change stream.

Q: What about the transactional outbox pattern — is CDC a replacement? CDC is complementary. The outbox pattern ensures atomic write + event creation within the application. CDC captures those outbox rows and publishes them without polling. They solve different layers of the same reliability problem.