Data Pipeline Observability: Monitoring Streaming Pipelines

Schema evolution in streaming pipelines handles changes to data structure — new columns, renamed fields, type changes — without pipeline downtime. This is one of the hardest operational challenges in streaming: a schema change in the source database can break downstream consumers if not handled correctly.

Schema Change Types

Change	Risk Level	Handling
Add column	Low	Downstream ignores unknown columns
Drop column	Medium	Downstream must handle missing data
Rename column	High	Breaks name-based consumers
Change type	High	Requires type compatibility
Add NOT NULL	Medium	Old records need defaults

Best Practices

Use a schema registry (Confluent Schema Registry, AWS Glue) to track and enforce compatibility
Prefer backward-compatible changes (add columns with defaults, avoid renames)
Version your schemas — never modify in place
Test schema changes in staging before production
Use Iceberg for destinations — Iceberg's schema evolution handles changes gracefully

Schema Evolution with Iceberg

-- Iceberg handles schema changes without data rewrites
ALTER TABLE events ADD COLUMN user_agent VARCHAR;
-- Old data returns NULL for user_agent; new data includes it
-- No pipeline restart required

Frequently Asked Questions

How do I handle schema changes in Kafka topics?

Use a schema registry (Confluent, AWS Glue) with compatibility enforcement. Set compatibility mode to BACKWARD (consumers can read old and new schemas) or FULL (both directions).

Does RisingWave handle schema evolution?

RisingWave's CDC sources track source schema changes. For Iceberg sinks, schema evolution is handled by the Iceberg table format. For Kafka sources, use schema registry compatibility checks.

Schema Evolution in Streaming Pipelines: Best Practices