Real-Time Microservices Sync with Debezium Event Sourcing

Real-Time Microservices Sync with Debezium Event Sourcing

Debezium enables the outbox pattern—a proven approach to reliable event sourcing in microservices—by treating your database's transaction log as the authoritative event stream. Every committed change becomes an event, guaranteeing that your microservice state and the events it emits are always in sync without distributed transactions.

The Problem: Dual Writes in Microservices

In a microservice architecture, a common bug looks like this:

  1. Service saves an order to its database.
  2. Service publishes an OrderCreated event to Kafka.

If the publish fails after the save, or the save fails after the publish, your services drift out of sync. Compensating logic is fragile. Distributed transactions (2PC) are slow and couple services together.

Event sourcing is the architectural answer: the event log is the source of truth, not the database table. But implementing event sourcing from scratch is complex. Debezium offers a shortcut via the transactional outbox pattern.

The Outbox Pattern with Debezium

Instead of publishing directly to Kafka, your service writes events to an outbox table inside the same database transaction as the business operation. Debezium captures changes to the outbox table and forwards them to Kafka.

-- Outbox table in your microservice's database
CREATE TABLE outbox_events (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR NOT NULL,  -- e.g., 'Order'
    aggregate_id   VARCHAR NOT NULL,  -- e.g., order ID
    event_type     VARCHAR NOT NULL,  -- e.g., 'OrderCreated'
    payload        JSONB   NOT NULL,
    created_at     TIMESTAMPTZ DEFAULT NOW()
);

The application code:

BEGIN;
  -- Business operation: create the order (aggregate root)
  INSERT INTO orders (id, customer_id, total, status)
  VALUES ('ord-123', 'cust-456', 149.99, 'pending');

  -- Event sourcing: record the event in the outbox
  INSERT INTO outbox_events (aggregate_type, aggregate_id, event_type, payload)
  VALUES ('Order', 'ord-123', 'OrderCreated',
          '{"orderId":"ord-123","customerId":"cust-456","total":149.99}');
COMMIT;

Both writes succeed or both fail—atomically. Debezium then picks up the outbox row and publishes it to Kafka.

Step-by-Step Tutorial

Step 1: Configure Debezium with the Outbox Event Router SMT

Debezium provides a built-in OutboxEventRouter SMT that routes events to dynamic Kafka topics based on aggregate_type:

{
  "name": "orders-outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "secret",
    "database.dbname": "orders_service",
    "database.server.name": "ordersvc",
    "table.include.list": "public.outbox_events",
    "plugin.name": "pgoutput",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.table.field.event.id": "id",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.table.field.event.type": "event_type",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "outbox.event.${routedByValue}"
  }
}

Events from the Order aggregate will be routed to outbox.event.Order.

Step 2: Connect RisingWave to Consume the Event Stream

-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE order_events (
    order_id    VARCHAR,
    customer_id VARCHAR,
    total       NUMERIC,
    status      VARCHAR,
    updated_at  TIMESTAMPTZ,
    _op         VARCHAR  -- debezium op field: c/u/d/r
) WITH (
    connector = 'kafka',
    topic = 'outbox.event.Order',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Step 3: Build a CQRS Read Model with Materialized Views

The CQRS (Command Query Responsibility Segregation) pattern separates writes (commands) from reads (queries). RisingWave's materialized views are the read side:

-- Aggregate: current order state derived from event stream
CREATE MATERIALIZED VIEW current_order_state AS
SELECT
    order_id,
    customer_id,
    total,
    status,
    MAX(updated_at) AS last_updated
FROM order_events
GROUP BY order_id, customer_id, total, status;

-- Saga monitoring: detect orders stuck in 'pending' for > 10 minutes
CREATE MATERIALIZED VIEW stalled_sagas AS
SELECT
    order_id,
    customer_id,
    total,
    last_updated,
    EXTRACT(EPOCH FROM (NOW() - last_updated)) AS seconds_pending
FROM current_order_state
WHERE status = 'pending'
  AND last_updated < NOW() - INTERVAL '10 minutes';

Step 4: Implement Eventual Consistency Monitoring

In microservices, eventual consistency means that different services will converge to the same state—but there may be a lag. Track it:

-- Cross-service consistency: compare order service vs payment service event counts
CREATE MATERIALIZED VIEW consistency_check AS
SELECT
    oe.order_id,
    oe.total         AS order_total,
    pe.amount        AS payment_amount,
    CASE
      WHEN pe.order_id IS NULL THEN 'payment_missing'
      WHEN oe.total != pe.amount THEN 'amount_mismatch'
      ELSE 'consistent'
    END AS consistency_status
FROM current_order_state oe
LEFT JOIN payment_events pe ON oe.order_id = pe.order_id;

Comparison Table

PatternConsistencyComplexityFailure Mode
Dual write (no outbox)Eventual (unreliable)LowSilent data drift
Outbox + DebeziumExactly-once (transactional)MediumConnector lag, not data loss
Direct Kafka in appAt-least-onceMediumDuplicate events on retry
Saga (Debezium-backed)Eventual (managed)HighCompensating transaction needed
Event sourcing (pure)Strong (event log = truth)Very highReplay complexity

FAQ

Q: What is the aggregate root in the context of the outbox pattern? An aggregate root is the primary entity that owns a group of related data—an Order that contains OrderItems, for example. All changes to the aggregate are expressed as events attached to the root's ID. Debezium routes these events by aggregate_id so consumers can reconstruct the full history of any aggregate.

Q: How do I clean up the outbox table without losing events? Debezium captures the insert event the moment it appears in the change log. Once the event is confirmed in Kafka (check consumer offset or use Kafka's acks=all), you can delete old rows from the outbox table. A common pattern is to add an processed_at column, set it via a Debezium consumer callback, and then purge rows older than 24 hours.

Q: Does this pattern work with MySQL or SQL Server, not just PostgreSQL? Yes. The outbox table pattern works with any database that Debezium supports. The OutboxEventRouter SMT is connector-agnostic. The only requirement is that your application code writes to the outbox table within the same ACID transaction as the business operation.

Key Takeaways

  • The transactional outbox pattern eliminates dual-write inconsistency by making the database the authoritative event source.
  • Debezium's OutboxEventRouter SMT routes events to dynamic Kafka topics based on aggregate type, mapping cleanly to domain-driven design boundaries.
  • CQRS read models in RisingWave are continuously updated materialized views derived from the event stream—no polling required.
  • Stalled saga detection (events stuck in intermediate states) is straightforward with streaming SQL window functions.
  • Eventual consistency gaps between microservices can be quantified and alerted on using cross-source joins in RisingWave.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.