Debezium Outbox Pattern: When to Use It and When to Skip It

Debezium Outbox Pattern: When to Use It and When to Skip It

The outbox pattern solves a specific microservices problem: how do you atomically write to your database and publish a domain event without two-phase commit? Debezium captures the outbox table and routes events to Kafka, ensuring no event is lost. But for analytics and reporting use cases, this pattern adds unnecessary complexity. RisingWave can read from the same source database and materialize results directly — no outbox table, no Kafka, no event routing.

The Problem the Outbox Pattern Solves

In a microservices architecture, a service might need to update its own database and publish an event to a message bus in the same logical operation. Doing both independently risks a partial failure: the database write succeeds but the Kafka publish fails (or vice versa).

The outbox pattern eliminates this by writing only to the database:

  1. The service writes its primary record (e.g., orders) and an outbox record in a single transaction.
  2. Debezium captures the outbox table via CDC.
  3. Debezium publishes the event to Kafka.
  4. Downstream services consume from Kafka.

The database transaction is the guarantee. Debezium handles eventual delivery to Kafka. No distributed transaction required.

The Outbox Table Schema

CREATE TABLE outbox_events (
  id             UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  aggregate_type VARCHAR NOT NULL,  -- e.g., 'Order'
  aggregate_id   VARCHAR NOT NULL,  -- e.g., order ID
  event_type     VARCHAR NOT NULL,  -- e.g., 'OrderPlaced'
  payload        JSONB NOT NULL,
  created_at     TIMESTAMPTZ DEFAULT NOW()
);

The application service writes both records in one transaction:

// Inside a Spring @Transactional method
orderRepository.save(order);
outboxEventRepository.save(OutboxEvent.builder()
  .aggregateType("Order")
  .aggregateId(order.getId().toString())
  .eventType("OrderPlaced")
  .payload(mapper.writeValueAsString(orderEvent))
  .build());

If the transaction commits, both records are written. Debezium will capture the outbox row and publish it. If the transaction rolls back, neither record is written and no event is published.

Debezium Outbox Event Router Configuration

The Outbox Event Router is a Debezium Single Message Transform (SMT). It reads from the outbox table and routes each event to a dedicated Kafka topic based on the aggregate_type column.

{
  "name": "orders-service-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "secret",
    "database.dbname": "orders_db",
    "topic.prefix": "orders-service",
    "plugin.name": "pgoutput",
    "table.include.list": "public.outbox_events",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.table.field.event.id": "id",
    "transforms.outbox.table.field.event.key": "aggregate_id",
    "transforms.outbox.table.field.event.payload": "payload",
    "transforms.outbox.table.expand.json.payload": "true",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "outbox.${routedByValue}.events",
    "tombstones.on.delete": "false"
  }
}

With this config, an event with aggregate_type = 'Order' routes to the Kafka topic outbox.Order.events. A Payment event routes to outbox.Payment.events. Different downstream services subscribe to only the topics they care about.

Scenario 1: Outbox Pattern for Microservices (Use Debezium)

Consider an e-commerce platform with these services:

  • Fulfillment Service: subscribes to outbox.Order.events to trigger warehouse picks
  • Notification Service: subscribes to outbox.Order.events to send customer emails
  • Loyalty Service: subscribes to outbox.Order.events to award points

Three independent consumers, one Kafka topic. Each service operates autonomously. If the Loyalty Service is down, it replays from its last committed offset when it comes back — without affecting Fulfillment or Notifications.

This is the outbox pattern at its best. Debezium is the right tool. The fan-out to multiple independent consumers is exactly what Kafka and Debezium are designed for.

PostgreSQL (orders + outbox_events)
  └── Debezium → Kafka: outbox.Order.events
        ├── Fulfillment Service
        ├── Notification Service
        └── Loyalty Service

Scenario 2: Outbox for Analytics (Skip the Outbox Pattern)

Now consider a different team. They want:

  • A real-time dashboard showing orders per region
  • A running total of revenue by product category
  • Hourly trends for conversion rates

The "consumer" here is not a microservice. It is a set of SQL queries. There is no fan-out to independent services. The outbox pattern adds an entire layer — the outbox table, Debezium, Kafka, a consumer — that serves no architectural purpose for this use case.

RisingWave reads from the same PostgreSQL database and materializes the query results directly:

-- Connect to the same PostgreSQL source
CREATE SOURCE orders_source
WITH (
  connector = 'postgres-cdc',
  hostname = 'postgres.internal',
  port = '5432',
  username = 'replication_user',
  password = 'secret',
  database.name = 'orders_db'
);

CREATE TABLE orders (
  id          BIGINT,
  customer_id BIGINT,
  product_id  INT,
  category    VARCHAR,
  amount      DECIMAL(10,2),
  region      VARCHAR,
  status      VARCHAR,
  created_at  TIMESTAMPTZ,
  PRIMARY KEY (id)
) FROM orders_source TABLE 'public.orders';

-- Real-time revenue by category
CREATE MATERIALIZED VIEW revenue_by_category AS
SELECT
  category,
  SUM(amount) AS total_revenue,
  COUNT(*) AS order_count
FROM orders
WHERE status != 'cancelled'
GROUP BY category;

-- Hourly conversion trends
CREATE MATERIALIZED VIEW hourly_conversions AS
SELECT
  DATE_TRUNC('hour', created_at) AS hour,
  region,
  COUNT(*) AS orders_placed
FROM orders
GROUP BY 1, 2;

No outbox table. No Kafka. No consumer code. The dashboard queries revenue_by_category directly using any PostgreSQL-compatible client.

Side-by-Side: Microservices vs Analytics Consumer

DimensionOutbox + Debezium + KafkaRisingWave Direct CDC
Consumer typeIndependent microservicesSQL queries / dashboards
Fan-outMultiple independent consumersSingle query engine
Replay on failureKafka consumer groupsMaterialized view recompute
Message delivery guaranteeAt-least-once (Kafka)Exactly-once (RisingWave state)
InfrastructurePostgreSQL, Kafka, Debezium, servicesPostgreSQL, RisingWave
Operational overheadHighLow
Business logic locationConsumer codeSQL materialized views
Ideal forEvent-driven microservicesReal-time analytics, reporting

When the Outbox Pattern Is Overkill

The outbox pattern is overkill when:

  • There is only one consumer. If a single analytics pipeline is the only reader, Kafka's consumer group model adds no value.
  • The consumer is a batch job. If your "real-time" requirement is a 5-minute refresh, query PostgreSQL directly with a WHERE updated_at > :last_run clause.
  • The consumer is SQL-based. Aggregations, joins, and window functions belong in SQL, not in Kafka consumer code.

The pattern shines when the consuming side is a set of independent, autonomous services that must evolve independently and may be offline at different times.

Handling the Delete Problem

The outbox pattern typically uses soft deletes or never deletes outbox rows. If rows are deleted, Debezium emits a delete event, and consumers may not handle it correctly. The standard practice is to add a TTL cleanup job:

-- Run periodically (cron or pg_cron)
DELETE FROM outbox_events
WHERE created_at < NOW() - INTERVAL '7 days';

RisingWave CDC handles actual deletes from the source table natively — it applies the delete to the materialized view's internal state. No outbox intermediary needed.

FAQ

Can I use RisingWave alongside Debezium for the outbox pattern? Yes. If you run Debezium for microservice fan-out and also want analytics, RisingWave can read directly from the primary orders table via CDC — not the outbox table. Both tools connect to PostgreSQL independently.

Does the outbox pattern guarantee exactly-once delivery? No. Debezium + Kafka provides at-least-once delivery. Consumers must be idempotent. The outbox pattern guarantees no events are lost, not that events are processed exactly once.

What if the outbox table grows very large? Large outbox tables slow down the Debezium snapshot on restart and consume PostgreSQL storage. Implement a cleanup job (see the DELETE example above) and index created_at for efficient cleanup. Consider partitioning by date for high-volume systems.

Can RisingWave replace the outbox pattern entirely for microservices? Not cleanly. RisingWave is a query engine, not a message bus. If downstream services need independent offset tracking and replay semantics, Kafka's consumer group model is the right abstraction. RisingWave replaces the analytics use case, not the microservices integration use case.

How does the Outbox Event Router handle schema evolution? The payload field is JSONB or a string, so schema changes in the payload do not break the router. Consumer applications must handle schema evolution in the payload themselves, often using schema registry. This is one argument for RisingWave: schema changes in the source table propagate naturally to SQL materialized views without a separate registry.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.