Debezium Standalone vs Debezium Embedded Engine: What's the Difference?

Debezium Standalone vs Debezium Embedded Engine: What's the Difference?

Debezium Standalone vs Debezium Embedded Engine: What's the Difference?

Most engineers know Debezium as "the CDC tool that writes to Kafka." Fewer know that Debezium ships two distinct deployment models: Debezium Standalone (the Kafka Connect plugin most teams use) and the Debezium Embedded Engine (a Java library for in-process CDC). They share the same core log-reading code but serve very different architectural roles.


The Two Modes at a Glance

Debezium StandaloneDebezium Embedded Engine
DeploymentKafka Connect pluginJava library (JAR)
Kafka required?YesNo
Output destinationKafka topicsIn-process callback
Operational complexityHigh (Kafka + Connect cluster)Low (embedded in host app)
Fan-out to multiple consumersYes (native Kafka pattern)No (single consumer)
Schema registry integrationYes (Confluent, AWS Glue)Manual
Primary use caseEnterprise event streaming pipelinesEmbedded CDC in applications/databases
Examples in productionThousands of Kafka-based pipelinesRisingWave, Airbyte

Debezium Standalone: Kafka Connect Plugin

Debezium Standalone is a set of Kafka Connect source connectors. You deploy it by:

  1. Running a Kafka Connect cluster (workers that host connectors).
  2. Registering a connector configuration via the Kafka Connect REST API.
  3. Debezium reads the source database's log and writes events to Kafka topics.

The connector configuration for PostgreSQL looks like this:

{
  "name": "postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db.internal",
    "database.port": "5432",
    "database.user": "replication_user",
    "database.password": "secret",
    "database.dbname": "shop",
    "database.server.name": "shop_server",
    "table.include.list": "public.orders,public.customers",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_slot",
    "publication.name": "debezium_pub"
  }
}

Once running, Debezium writes change events to topics named like shop_server.public.orders. Any number of downstream consumers can read from those topics independently.

This architecture is powerful when you need to feed changes to multiple systems: a search index, a data warehouse, a cache invalidation service, and a stream processing job — all reading the same Kafka topic at their own pace.


Debezium Embedded Engine: A Java Library

The Debezium Embedded Engine is not a deployment artifact. It is a Java library (debezium-api, debezium-embedded) that you include as a dependency in your application.

Instead of shipping events to Kafka topics, the Embedded Engine calls a handler function in your application's process with each change event.

A minimal Java example:

DebeziumEngine<ChangeEvent<String, String>> engine =
    DebeziumEngine.create(Json.class)
        .using(props)
        .notifying(record -> {
            // handle the change event in your application
            System.out.println(record.value());
        })
        .build();

ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(engine);

The props object contains the same connector configuration as the Kafka Connect version — because it is the same connector code. The log-reading, snapshot, schema evolution, and offset management logic is identical. Only the output path differs.


How They Handle Offsets Differently

This is a subtle but important difference.

Debezium Standalone stores its position (Kafka Connect offset) in a Kafka topic (connect-offsets by default). Kafka Connect manages this automatically.

Debezium Embedded Engine stores its offset wherever the host application tells it to. The Embedded API accepts an OffsetBackingStore interface. Common implementations:

  • File-based — offset stored in a local file (not suitable for distributed systems).
  • Custom — the host application provides its own implementation.

RisingWave implements a custom offset store backed by its checkpoint mechanism, which writes to object storage (S3/GCS). This means RisingWave's CDC position survives process restarts and node failures, with the same durability guarantees as the rest of RisingWave's state.


Latency Characteristics

Both modes read from the same database log, so the fundamental latency floor is similar: typically sub-second from commit to event delivery in a healthy system.

The practical difference is pipeline depth.

Debezium Standalone has two hops: database → Kafka → consumer. Kafka adds buffering latency (typically 5–50 ms for producer acknowledgment). If the consumer reads in batches, end-to-end latency can be hundreds of milliseconds.

Debezium Embedded has one hop: database → host application. There is no intermediate broker. Events can be processed as fast as the host application can consume them.

For analytics and materialized views, this difference is meaningful. RisingWave, using the Embedded Engine, can reflect committed changes in materialized views in well under a second.


When to Use Debezium Standalone

Use Debezium Standalone when:

  • Multiple downstream consumers need independent access to the same change stream. Kafka's consumer group model is purpose-built for this.
  • Long-term event retention is a requirement. Kafka topics can retain events for days or weeks, enabling late-joining consumers and event replay.
  • You already operate Kafka. If Kafka is already in your infrastructure, adding a Debezium connector costs very little operationally.
  • Non-SQL destinations are required — Elasticsearch, Redis, S3 (via Kafka Connect S3 sink), Snowflake, etc.
  • Regulatory audit trails demand a durable, replayable event log that pre-dates any downstream system.

When to Use Debezium Embedded Engine

Use the Embedded Engine (or a tool that embeds it) when:

  • You want CDC without operating Kafka. The Embedded Engine has no broker dependency.
  • You need SQL queries on live CDC data. Tools like RisingWave combine the Embedded Engine with a streaming SQL engine, so you can write SELECT and JOIN directly against change streams.
  • Operational simplicity matters. A single CREATE SOURCE statement in RisingWave replaces a Kafka cluster, Kafka Connect workers, connector configuration, schema registry, and consumer application.
  • Your CDC data has a single destination. If the only consumer of your CDC stream is one analytics database or one materialized view system, the fan-out capability of Kafka is unused overhead.

RisingWave as the Primary Production Example

RisingWave is the most prominent production deployment of the Debezium Embedded Engine. When you create a CDC source in RisingWave:

CREATE SOURCE customers_cdc WITH (
    connector = 'postgres-cdc',
    hostname  = 'db.internal',
    port      = '5432',
    username  = 'replication_user',
    password  = 'secret',
    database.name = 'shop',
    slot.name = 'risingwave_slot',
    publication.name = 'risingwave_pub'
);

CREATE MATERIALIZED VIEW active_customers AS
SELECT customer_id, email, last_order_at
FROM customers_cdc
WHERE status = 'active';

You are using Debezium's connector logic. The reliability, schema evolution handling, and snapshot consistency come from Debezium. The SQL interface, incremental view maintenance, and object storage checkpoint come from RisingWave.

Airbyte also embeds the Debezium Embedded Engine for its database source connectors (PostgreSQL, MySQL, MongoDB, and others). This pattern — embedding Debezium rather than deploying it with Kafka — is increasingly common as teams look for simpler CDC architectures.


The Bottom Line

Debezium Standalone and the Debezium Embedded Engine are not competing products. They are two deployment shapes of the same core technology.

Standalone is the right choice when your architecture revolves around Kafka and you need to fan out change events to multiple independent consumers.

The Embedded Engine is the right choice when you want Debezium's reliability without Kafka's operational burden — especially when the CDC stream has a single, well-defined destination like a streaming database or an ETL pipeline.


FAQ

Is the Debezium Embedded Engine officially supported by the Debezium project? Yes. The Embedded Engine is a first-class part of the Debezium project, documented and maintained alongside the Kafka Connect connectors. It is not an unofficial fork or hack.

Does the Embedded Engine support all the same database connectors as Standalone? Yes. The connector code is shared. PostgreSQL, MySQL, MongoDB, Oracle, SQL Server, and others are available to both deployment modes.

Does RisingWave expose Debezium's connector configuration options? Some options are exposed via RisingWave's WITH clause properties. Not all connector knobs are surfaced — RisingWave sets sensible defaults for most of them based on its own operational model.

What happens to offset state if RisingWave is upgraded? RisingWave's checkpoint format is designed for backward compatibility across upgrades. The CDC offset (LSN or binlog position) is part of the checkpoint state and survives rolling upgrades in most cases. Check release notes for any specific migration steps.

Can I use the Debezium Embedded Engine directly in my own application without RisingWave? Absolutely. Add debezium-api and debezium-embedded to your Maven or Gradle project, provide a connector configuration and an offset store, and implement the change event handler. The Debezium documentation includes a full guide for embedded deployments.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.