Debezium Outbox Pattern: Reliable Event Streaming for Microservices

Debezium Outbox Pattern: Reliable Event Streaming for Microservices

The Debezium outbox pattern solves the dual-write problem in microservices: how to update a database and publish a message atomically, without distributed transactions. By writing events to a dedicated outbox table within the same local transaction, then using Debezium CDC to stream those rows to Kafka, you get guaranteed-once event delivery without two-phase commit.

The Dual-Write Problem

In event-driven microservices, a common operation involves two steps: persisting a state change to a database and publishing an event to a message broker. Doing these independently creates race conditions:

  • The database write succeeds but the message broker publish fails → the event is lost
  • The publish succeeds but the database write fails → a phantom event is sent
  • The service crashes between the two operations → partial state

Traditional solutions like XA distributed transactions are operationally expensive and poorly supported by modern databases and brokers. The outbox pattern solves this at the application layer.

How the Outbox Pattern Works

Instead of publishing directly to Kafka, the application writes event records to an outbox table within the same local database transaction as the business entity change:

BEGIN TRANSACTION;
  UPDATE orders SET status = 'confirmed' WHERE id = 42;
  INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload) 
    VALUES ('Order', '42', 'OrderConfirmed', '{"order_id":42,...}');
COMMIT;

The outbox table is a regular relational table. Debezium monitors it via CDC, picks up new rows, and publishes them to Kafka. The event is only published if the transaction commits. If the transaction rolls back, neither the order update nor the outbox row lands in the database.

The Debezium Outbox Event Router SMT

Debezium provides the outbox.event.router Single Message Transform (SMT) — a Kafka Connect transform that post-processes raw CDC events from the outbox table and routes them to the correct topic based on aggregate type, rather than a single outbox table topic.

Without the SMT, all outbox events land on one topic (dbserver1.public.outbox). The SMT reads configured fields (aggregate type, aggregate ID, event type, payload) and routes each event to a topic per aggregate type (e.g., order.events, customer.events).

Step-by-Step Tutorial

Step 1: Create the Outbox Table

CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Create a publication for it:

CREATE PUBLICATION dbz_outbox_pub FOR TABLE public.outbox;

Step 2: Write to the Outbox Atomically

In your application code, write business state and the outbox event in the same transaction:

BEGIN;

-- Update the order
UPDATE orders 
SET status = 'shipped', shipped_at = NOW()
WHERE id = $1;

-- Write the outbox event
INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES (
    'Order',
    $1::TEXT,
    'OrderShipped',
    jsonb_build_object(
        'order_id', $1,
        'shipped_at', NOW(),
        'tracking_number', $2
    )
);

COMMIT;

Step 3: Configure Debezium with the Outbox SMT

{
  "name": "outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "secret",
    "database.dbname": "production",
    "database.server.name": "dbserver1",
    "slot.name": "outbox_slot",
    "plugin.name": "pgoutput",
    "publication.name": "dbz_outbox_pub",
    "table.include.list": "public.outbox",
    "topic.prefix": "dbserver1",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.table.field.event.id": "id",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.type": "event_type",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "${routedByValue}.events",
    "transforms.outbox.table.fields.additional.placement": "event_type:header:eventType"
  }
}

With this configuration, aggregate_type = 'Order' routes to order.events, and aggregate_type = 'Customer' routes to customer.events.

Step 4: Connect RisingWave to Process Outbox Events

RisingWave is a PostgreSQL-compatible streaming database. Create a source from the routed Kafka topic:

CREATE SOURCE order_events
WITH (
    connector = 'kafka',
    topic = 'order.events',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Step 5: Build Analytics on Outbox Events

Aggregate event counts by type for operational monitoring:

CREATE MATERIALIZED VIEW order_event_counts AS
SELECT
    event_type,
    DATE_TRUNC('minute', created_at) AS minute,
    COUNT(*) AS event_count
FROM order_events
GROUP BY 1, 2;

Sink operational metrics to a monitoring Kafka topic:

CREATE SINK event_metrics_sink
FROM order_event_counts
WITH (
    connector = 'kafka',
    topic = 'order-event-metrics',
    properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON
KEY ENCODE JSON (event_type, minute);

Comparison Table

ApproachAtomicityComplexityRequires 2PC
Direct dual-writeNone (race condition risk)LowNo
XA distributed transactionsFullHighYes
Outbox pattern (Debezium)Full (via local DB transaction)MediumNo
Change Feed (e.g., DynamoDB Streams)FullLowNo

FAQ

Should I delete rows from the outbox table after they're processed? Yes — either automatically via a cleanup job or by configuring Debezium to emit tombstone events after delete. Use a scheduled DELETE FROM outbox WHERE created_at < NOW() - INTERVAL '7 days' to keep the table small. Debezium will emit op=d events for the deleted rows, which consumers typically ignore.

What is the Aggregate ID used for? The aggregate_id becomes the Kafka message key. This ensures all events for a given entity (e.g., a specific order) are routed to the same Kafka partition, preserving ordering within that entity's event stream.

Can I use this pattern with MySQL? Yes. The outbox table and application-level write pattern are database-agnostic. The Debezium MySQL connector supports the same EventRouter SMT. Configure with connector.class = io.debezium.connector.mysql.MySqlConnector and the same SMT settings.

Key Takeaways

  • The outbox pattern eliminates dual-write risk by writing events transactionally alongside business data
  • Debezium's outbox.event.router SMT routes events to per-aggregate-type Kafka topics automatically
  • The aggregate ID becomes the Kafka message key, preserving per-entity event ordering
  • RisingWave can consume routed outbox events from Kafka for real-time operational analytics
  • This pattern requires no distributed transactions — atomicity is guaranteed by the local database transaction

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.