Schema changes are the number-one operational pain point in CDC pipelines. When a developer runs ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2), your Debezium connector either adapts gracefully, pauses silently, or throws errors that break downstream consumers — depending entirely on how your pipeline is configured.
Why Schema Changes Break CDC Pipelines
Debezium captures row-level change events, not DDL events. Each event payload carries the schema at the moment of capture. When the upstream schema changes, the connector must reconcile the old schema with the new one — and every component downstream must agree on what the data looks like.
The three components that must stay in sync: Debezium's internal schema history, the Kafka schema registry (if used), and downstream consumers reading from Kafka topics.
What Debezium Does When a Column Is Added
When you run ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0 on a PostgreSQL source, here is what happens:
- PostgreSQL writes the DDL to the WAL.
- Debezium reads the DDL and updates its in-memory schema representation.
- The next DML event is emitted with the updated schema.
- If you use Confluent Schema Registry with
FULL_TRANSITIVEcompatibility, the new schema must be backward-compatible — which an added nullable column typically satisfies. - If you use
FULLorFORWARDcompatibility and the new column has no default, the registry may reject the schema update.
The connector does not pause for a simple nullable ADD COLUMN. It pauses or fails when:
- The new schema is incompatible with the registry compatibility setting.
- The
database.history.kafka.topicis unavailable or corrupted. - A downstream consumer deserializes the new payload with an old schema class.
The Error Messages You Will Actually See
Schema registry compatibility violation:
org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
{"type":"record","name":"orders","fields":[...{"name":"discount","type":["null","double"]}]}
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
Schema being registered is incompatible with an earlier schema; error code: 409
Connector paused with schema history error:
ERROR WorkerSourceTask{id=orders-connector-0} Task threw an uncaught and
unrecoverable exception. Task is being killed and will not recover until
manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
io.debezium.DebeziumException: The db history topic or its content is invalid
Downstream consumer with mismatched schema:
org.apache.avro.AvroTypeException: Found orders, expecting orders,
missing required field discount
Recovery Steps for Debezium Schema Failures
Step 1: Check the connector status.
curl -s http://localhost:8083/connectors/orders-connector/status | jq .
Look for "state": "FAILED" in the task array, not just the connector-level status. Connectors frequently show RUNNING at the top level while tasks are failed.
Step 2: Identify the schema conflict.
# Check what schema versions exist in the registry
curl -s http://localhost:8081/subjects/orders-value/versions | jq .
curl -s http://localhost:8081/subjects/orders-value/versions/latest | jq .
Step 3: Update schema registry compatibility if appropriate.
curl -X PUT http://localhost:8081/config/orders-value \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"compatibility": "BACKWARD"}'
Step 4: Restart the connector task.
curl -X POST http://localhost:8083/connectors/orders-connector/tasks/0/restart
Step 5: If the schema history topic is corrupted, you may need to delete and recreate the connector, which triggers a new snapshot. Before doing this, ensure downstream consumers can handle duplicate events (they should be idempotent).
curl -X DELETE http://localhost:8083/connectors/orders-connector
# Recreate with original config
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @connector-config.json
Configuring Debezium to Tolerate Schema Changes
The column.exclude.list and column.include.list properties let you narrow what Debezium tracks, reducing the blast radius of schema changes:
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.dbname": "shop",
"table.include.list": "public.orders",
"column.exclude.list": "public.orders.internal_notes",
"schema.history.internal.kafka.topic": "schema-history.orders",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"topic.prefix": "shop"
}
For Avro with schema registry, set compatibility mode upfront to BACKWARD — this allows adding optional fields (with defaults or null unions) without breaking existing consumers.
{
"key.converter": "io.confluent.kafka.serializers.KafkaAvroSerializer",
"value.converter": "io.confluent.kafka.serializers.KafkaAvroSerializer",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter.auto.register.schemas": "true"
}
How RisingWave Handles Schema Evolution
RisingWave's CDC connectors use the Debezium Embedded Engine internally — no Kafka required. But schema evolution is handled differently because RisingWave owns the full pipeline from source to materialized view.
When you add a column to a PostgreSQL table that RisingWave is ingesting, you propagate the change with ALTER TABLE:
-- On the PostgreSQL source
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2) DEFAULT 0;
-- In RisingWave, update the source definition
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2);
RisingWave resumes CDC ingestion with the new schema without requiring a full resnapshot. Materialized views that do not reference the new column are unaffected. Views that do reference it can be recreated or altered.
This works because RisingWave maintains its own schema registry internally and coordinates schema versions with its WAL checkpoints — there is no external schema registry to synchronize.
Comparison: Schema Change Handling
| Scenario | Debezium + Kafka | RisingWave |
| Add nullable column | Usually automatic with BACKWARD compat | ALTER TABLE on source, no resnapshot |
| Add NOT NULL column without default | Requires connector restart + schema update | Requires table backfill, then ALTER TABLE |
| Rename column | Breaking change, requires consumer migration | Requires view recreation |
| Drop column | Breaks consumers referencing column | Views referencing column must be dropped first |
| Schema registry unavailable | Connector fails | Not applicable |
Preventing Schema Change Incidents
The most reliable prevention is contract testing between your application team and your data team. Before any ALTER TABLE runs in production, validate it against the pipeline:
-- Test compatibility in staging before production
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'orders'
ORDER BY ordinal_position;
For Debezium pipelines, maintain a schema change runbook: update schema registry compatibility → run migration → verify connector status → confirm downstream consumers. For RisingWave, the runbook is simpler: coordinate the ALTER TABLE on both source and RisingWave in the same change window.
FAQ
Does Debezium capture DDL events themselves? Yes, but only for schema history tracking, not as consumable events on the change topic. DDL events are written to the internal schema history topic and are not forwarded to downstream Kafka topics in a structured format.
What is the schema history topic and do I always need it?
The schema history topic (schema.history.internal.kafka.topic for newer Debezium versions) is required for relational connectors like MySQL and PostgreSQL. It stores the full DDL history so the connector can reconstruct the schema at any point in the WAL. Without it, the connector cannot restart after a schema change.
Can I use JSON instead of Avro to avoid schema registry issues? Yes. JSON converters do not require a schema registry. The tradeoff is larger message payloads, no schema enforcement, and downstream consumers must handle schema drift manually. For production pipelines, Avro with a managed registry is preferable at scale.
What happens if I add a NOT NULL column without a default to an already-streaming table?
Debezium will capture new events correctly. But rows written before the DDL will have null in that column position in the WAL, which may cause constraint violations when you try to load events into strongly typed targets. Always add NOT NULL columns with a DEFAULT or backfill before enforcing the constraint.
How does RisingWave handle schema changes for sources connected to Kafka topics (not direct CDC)?
For Kafka sources with schema registry, RisingWave can pull updated schemas automatically when schema.registry is configured. For direct CDC sources, schema changes propagate through ALTER TABLE as described above.

