Debezium Schema Changes: How to Handle DDL Events in CDC Pipelines

Schema changes are the number-one operational pain point in CDC pipelines. When a developer runs ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2), your Debezium connector either adapts gracefully, pauses silently, or throws errors that break downstream consumers — depending entirely on how your pipeline is configured.

Why Schema Changes Break CDC Pipelines

Debezium captures row-level change events, not DDL events. Each event payload carries the schema at the moment of capture. When the upstream schema changes, the connector must reconcile the old schema with the new one — and every component downstream must agree on what the data looks like.

The three components that must stay in sync: Debezium's internal schema history, the Kafka schema registry (if used), and downstream consumers reading from Kafka topics.

What Debezium Does When a Column Is Added

When you run ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0 on a PostgreSQL source, here is what happens:

PostgreSQL writes the DDL to the WAL.
Debezium reads the DDL and updates its in-memory schema representation.
The next DML event is emitted with the updated schema.
If you use Confluent Schema Registry with FULL_TRANSITIVE compatibility, the new schema must be backward-compatible — which an added nullable column typically satisfies.
If you use FULL or FORWARD compatibility and the new column has no default, the registry may reject the schema update.

The connector does not pause for a simple nullable ADD COLUMN. It pauses or fails when:

The new schema is incompatible with the registry compatibility setting.
The database.history.kafka.topic is unavailable or corrupted.
A downstream consumer deserializes the new payload with an old schema class.

The Error Messages You Will Actually See

Schema registry compatibility violation:

org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
  {"type":"record","name":"orders","fields":[...{"name":"discount","type":["null","double"]}]}
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
  Schema being registered is incompatible with an earlier schema; error code: 409

Connector paused with schema history error:

ERROR WorkerSourceTask{id=orders-connector-0} Task threw an uncaught and
unrecoverable exception. Task is being killed and will not recover until
manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
io.debezium.DebeziumException: The db history topic or its content is invalid

Downstream consumer with mismatched schema:

org.apache.avro.AvroTypeException: Found orders, expecting orders,
missing required field discount

Recovery Steps for Debezium Schema Failures

Step 1: Check the connector status.

curl -s http://localhost:8083/connectors/orders-connector/status | jq .

Look for "state": "FAILED" in the task array, not just the connector-level status. Connectors frequently show RUNNING at the top level while tasks are failed.

Step 2: Identify the schema conflict.

# Check what schema versions exist in the registry
curl -s http://localhost:8081/subjects/orders-value/versions | jq .
curl -s http://localhost:8081/subjects/orders-value/versions/latest | jq .

Step 3: Update schema registry compatibility if appropriate.

curl -X PUT http://localhost:8081/config/orders-value \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"compatibility": "BACKWARD"}'

Step 4: Restart the connector task.

curl -X POST http://localhost:8083/connectors/orders-connector/tasks/0/restart

Step 5: If the schema history topic is corrupted, you may need to delete and recreate the connector, which triggers a new snapshot. Before doing this, ensure downstream consumers can handle duplicate events (they should be idempotent).

curl -X DELETE http://localhost:8083/connectors/orders-connector
# Recreate with original config
curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @connector-config.json

Configuring Debezium to Tolerate Schema Changes

The column.exclude.list and column.include.list properties let you narrow what Debezium tracks, reducing the blast radius of schema changes:

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "postgres",
  "database.dbname": "shop",
  "table.include.list": "public.orders",
  "column.exclude.list": "public.orders.internal_notes",
  "schema.history.internal.kafka.topic": "schema-history.orders",
  "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
  "topic.prefix": "shop"
}

For Avro with schema registry, set compatibility mode upfront to BACKWARD — this allows adding optional fields (with defaults or null unions) without breaking existing consumers.

{
  "key.converter": "io.confluent.kafka.serializers.KafkaAvroSerializer",
  "value.converter": "io.confluent.kafka.serializers.KafkaAvroSerializer",
  "value.converter.schema.registry.url": "http://schema-registry:8081",
  "value.converter.auto.register.schemas": "true"
}

How RisingWave Handles Schema Evolution

RisingWave's CDC connectors use the Debezium Embedded Engine internally — no Kafka required. But schema evolution is handled differently because RisingWave owns the full pipeline from source to materialized view.

When you add a column to a PostgreSQL table that RisingWave is ingesting, you propagate the change with ALTER TABLE:

-- On the PostgreSQL source
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0;

-- In RisingWave, update the source definition
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2);

RisingWave resumes CDC ingestion with the new schema without requiring a full resnapshot. Materialized views that do not reference the new column are unaffected. Views that do reference it can be recreated or altered.

This works because RisingWave maintains its own schema registry internally and coordinates schema versions with its WAL checkpoints — there is no external schema registry to synchronize.

Comparison: Schema Change Handling

Scenario	Debezium + Kafka	RisingWave
Add nullable column	Usually automatic with BACKWARD compat	`ALTER TABLE` on source, no resnapshot
Add NOT NULL column without default	Requires connector restart + schema update	Requires table backfill, then `ALTER TABLE`
Rename column	Breaking change, requires consumer migration	Requires view recreation
Drop column	Breaks consumers referencing column	Views referencing column must be dropped first
Schema registry unavailable	Connector fails	Not applicable

Preventing Schema Change Incidents

The most reliable prevention is contract testing between your application team and your data team. Before any ALTER TABLE runs in production, validate it against the pipeline:

-- Test compatibility in staging before production
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'orders'
ORDER BY ordinal_position;

For Debezium pipelines, maintain a schema change runbook: update schema registry compatibility → run migration → verify connector status → confirm downstream consumers. For RisingWave, the runbook is simpler: coordinate the ALTER TABLE on both source and RisingWave in the same change window.

FAQ

Does Debezium capture DDL events themselves? Yes, but only for schema history tracking, not as consumable events on the change topic. DDL events are written to the internal schema history topic and are not forwarded to downstream Kafka topics in a structured format.

What is the schema history topic and do I always need it? The schema history topic (schema.history.internal.kafka.topic for newer Debezium versions) is required for relational connectors like MySQL and PostgreSQL. It stores the full DDL history so the connector can reconstruct the schema at any point in the WAL. Without it, the connector cannot restart after a schema change.

Can I use JSON instead of Avro to avoid schema registry issues? Yes. JSON converters do not require a schema registry. The tradeoff is larger message payloads, no schema enforcement, and downstream consumers must handle schema drift manually. For production pipelines, Avro with a managed registry is preferable at scale.

What happens if I add a NOT NULL column without a default to an already-streaming table? Debezium will capture new events correctly. But rows written before the DDL will have null in that column position in the WAL, which may cause constraint violations when you try to load events into strongly typed targets. Always add NOT NULL columns with a DEFAULT or backfill before enforcing the constraint.

How does RisingWave handle schema changes for sources connected to Kafka topics (not direct CDC)? For Kafka sources with schema registry, RisingWave can pull updated schemas automatically when schema.registry is configured. For direct CDC sources, schema changes propagate through ALTER TABLE as described above.