CDC to Apache Iceberg: Upsert and Delete Semantics Explained

CDC to Apache Iceberg: Upsert and Delete Semantics Explained

CDC pipelines writing to Apache Iceberg must handle INSERT, UPDATE, and DELETE events correctly. Iceberg supports upserts and deletes through a combination of primary key tracking, delete files, and merge strategies -- but the complexity lives in the writer, not the table format itself.

Why CDC to Iceberg Is Harder Than Append-Only Ingestion

Appending rows to an Iceberg table is straightforward. Every Kafka message becomes a new row. No conflict resolution is needed.

CDC is different. A CDC stream carries three event types: INSERT, UPDATE, and DELETE. The writer must translate each event into the correct table mutation. For UPDATE, it needs to replace the existing row. For DELETE, it needs to remove a row that may be scattered across dozens of data files written over weeks.

Iceberg does not enforce primary key uniqueness at the storage layer. That responsibility falls entirely on the writer. If the writer appends rows without deduplication, you end up with multiple versions of the same record -- a silent data quality problem that is hard to detect and painful to fix.

Challenge 1: Iceberg's Upsert Model Requires Writers to Deduplicate

Iceberg tables can declare primary keys in their metadata, but the format does not reject duplicate rows. It is a hint for writers, not a constraint enforced by the storage engine.

When a writer receives an UPDATE event, it needs to:

  1. Identify which existing row matches the primary key
  2. Mark that row as deleted
  3. Write the new row

Step 2 is where the two main delete strategies diverge.

Challenge 2: Two Types of Delete Files, Two Different Trade-Offs

Iceberg supports two delete file formats.

Positional deletes record the exact file path and row index of each deleted row. They are fast to apply at read time because the reader knows exactly which rows to skip. The downside is that the writer must know which file contains the row being deleted. For streaming pipelines processing millions of events per second, that lookup is expensive.

Equality deletes record the primary key values of deleted rows. The writer does not need to scan existing files -- it just writes "delete any row where order_id = 12345." At read time, the engine joins every data file against the equality delete files to filter out matching rows. This is simpler to write from a streaming context but creates read amplification that compounds as equality delete files accumulate.

Most streaming CDC pipelines default to equality deletes because they require no read-before-write. The trade-off is that read performance degrades until compaction runs.

Challenge 3: MOR vs COW -- Choosing the Right Merge Strategy

The merge strategy determines when the physical reconciliation of deletes and new data happens.

Merge-on-Read (MOR)

With MOR, the writer appends new data files and separate delete files. The data is not physically merged until a reader queries the table or a compaction job runs.

Pros:

  • Write latency is low. The writer only appends; it never rewrites existing files.
  • Well-suited for high-frequency CDC streams.

Cons:

  • Read queries must merge data files and delete files on every scan. With a backlog of equality deletes, a query touching millions of rows can be significantly slower than a clean table.
  • Compaction must run regularly to keep read performance acceptable.

Copy-on-Write (COW)

With COW, every UPDATE or DELETE rewrites the entire data file containing the affected row. The resulting table always has clean, fully merged files.

Pros:

  • Read performance is optimal. No delete files to merge at query time.
  • Simpler for downstream BI tools and query engines that do not understand Iceberg delete semantics well.

Cons:

  • Write amplification is severe for high-churn tables. If one row in a 1 GB file changes, the entire 1 GB file is rewritten.
  • Not practical for streaming CDC at scale.

The practical default for CDC pipelines is MOR with scheduled compaction. COW is only viable when update frequency is low and read performance is the overriding concern.

MORCOW
Write amplificationLow (append-only)High (rewrites entire files)
Read performanceDegrades without compactionAlways optimal
Best for CDCHigh-frequency upsert streamsLow-churn, read-heavy tables
Compaction requiredYes, regularlyNo
Equality delete riskAccumulates until compactionNot applicable

Apache Flink is the most common engine used to write CDC events to Iceberg. The standard approach uses the Flink Iceberg connector with Debezium or Canal as the CDC source.

When Flink receives a CDC event:

  • For INSERT: writes a new row to a data file
  • For UPDATE: writes the new row to a data file and writes an equality delete record for the old primary key
  • For DELETE: writes an equality delete record for the primary key

This means every non-INSERT event generates a delete file entry. Over time, equality delete files accumulate. Flink does not compact these files automatically -- you need a separate compaction job running on a schedule.

The configuration surface is also significant. You need to tune write.upsert.enabled, write.distribution-mode, checkpoint intervals, and compaction trigger thresholds. Getting these wrong leads to either data loss (missing deletes) or severe read degradation (equality delete accumulation).

Managing the compaction job adds operational overhead. A misconfigured or failing compaction job quietly degrades query performance for every downstream consumer of the table.

How RisingWave Handles CDC to Iceberg

RisingWave takes a different architectural approach. Instead of passing raw CDC events to the Iceberg writer, RisingWave materializes the current state of each table into a materialized view, then sinks that state to Iceberg.

The flow looks like this:

  1. A CDC source connector connects to MySQL, Postgres, or MongoDB and ingests the binlog stream.
  2. A materialized view processes the change events and maintains the current, correct state of each row.
  3. An Iceberg sink reads from the materialized view and writes rows with upsert semantics. Deletes are handled at the materialized view layer before they reach Iceberg.

The materialized view absorbs all the complexity of CDC deduplication. By the time data is written to Iceberg, it reflects the current state of the source table. The Iceberg sink does not need to deal with out-of-order events or primary key conflicts.

RisingWave is a PostgreSQL-compatible streaming database built in Rust. It is open source under Apache 2.0. Its native CDC connectors support transactional consistency across tables, which matters for pipelines that join multiple CDC sources before writing to Iceberg.

Practical SQL Example: RisingWave CDC to Iceberg Pipeline

The following example shows a full pipeline from a MySQL CDC source to an Iceberg sink.

Step 1: Create the CDC source

CREATE SOURCE mysql_orders
WITH (
  connector = 'mysql-cdc',
  hostname = 'mysql-host',
  port = '3306',
  username = 'replicator',
  password = 'secret',
  database.name = 'commerce',
  table.name = 'orders',
  server.id = '5001'
);

Step 2: Create a materialized view that tracks current order state

CREATE MATERIALIZED VIEW current_orders AS
SELECT
  order_id,
  customer_id,
  status,
  total_amount,
  updated_at
FROM mysql_orders
WHERE _rw_op != 'DELETE' OR order_id IS NULL;

Step 3: Create the Iceberg sink with primary key

CREATE SINK iceberg_orders_sink
FROM current_orders
WITH (
  connector = 'iceberg',
  type = 'upsert',
  catalog.name = 'glue',
  database.name = 'analytics',
  table.name = 'orders',
  primary_key = 'order_id',
  catalog.type = 'glue',
  warehouse.path = 's3://my-bucket/iceberg/'
);

The primary_key field tells RisingWave which column to use for upsert matching. Deletes that arrive in the CDC stream are applied at the materialized view layer, so Iceberg receives clean rows representing the current state.

For pipelines that need to propagate hard deletes to Iceberg, use type = 'upsert' with an additional filter or a separate delete materialized view.

Flink + IcebergRisingWave
CDC source supportDebezium/Canal connectorsNative MySQL, Postgres, MongoDB
Delete handlingEquality delete filesMaterialized view absorbs deletes
Compaction requirementSeparate compaction job requiredManaged internally
Configuration complexityHigh (upsert mode, distribution mode, compaction)Low (primary_key in SINK)
Multi-table joins before sinkRequires complex DAG setupNative SQL joins on MV
LanguageJava/SQLPostgreSQL-compatible SQL
ConsistencyAt-least-once or exactly-once (tunable)Exactly-once end-to-end

Challenge 5: Pitfalls That Cause Silent Data Problems

Missing primary keys. If you write CDC data to Iceberg without declaring a primary key, deletes and updates cannot target specific rows. The writer must scan the entire table to find matching rows, which is impractically slow at scale. The result is either dropped deletes (data becomes stale) or full table rewrites on every update.

Equality delete accumulation. Without regular compaction, equality delete files pile up. A table with 1 million rows and 100,000 pending equality deletes will apply those deletes on every read. Query time grows linearly with the delete file count. This is the most common performance problem in production Iceberg CDC pipelines.

Out-of-order events. CDC events can arrive out of order if the source has replication lag or partitioning. If an UPDATE arrives before its corresponding INSERT, the equality delete for the update has nothing to match against. The row gets inserted after the delete was already applied, leaving a ghost row in the table.

Schema evolution mismatches. When a source table gains a new column, the Iceberg schema must be updated before the CDC pipeline resumes. Flink pipelines require manual schema migration steps. RisingWave propagates schema changes through the materialized view automatically in many cases.

FAQ

Does Apache Iceberg enforce primary keys at the storage level?

No. Iceberg allows you to define primary key columns in the table schema as metadata, but it does not reject duplicate rows. Uniqueness enforcement is the writer's responsibility. If two writers insert the same primary key concurrently without coordination, both rows will exist in the table until deduplication runs.

When should I use equality deletes vs positional deletes for CDC?

Use equality deletes for streaming CDC pipelines. They require no read-before-write, which makes them practical for high-throughput streams. Use positional deletes in batch compaction jobs where you can afford to read existing data files and rewrite them with precise row-level deletions. Mixing both in the same pipeline is possible and common in mature deployments.

How often should I run Iceberg compaction on a CDC table?

The answer depends on your update rate and read SLA. A rule of thumb: compact when the ratio of equality delete records to total rows exceeds 10-20%. For tables with thousands of updates per second, that can mean running compaction every few minutes. Tools like Apache Spark, Trino, and Flink each have compaction procedures for Iceberg; schedule them as regular jobs separate from your ingestion pipeline.

Can RisingWave handle schema changes in the CDC source without pipeline restarts?

RisingWave's native CDC connectors detect schema changes in the source binlog and propagate them to downstream materialized views where the schema is compatible. Column additions are generally handled automatically. Column removals or type changes require manual intervention to update the materialized view definition.

What is the minimum viable setup for a production CDC-to-Iceberg pipeline?

At minimum, you need: a CDC connector with exactly-once delivery guarantees, primary keys defined on the Iceberg table, and a scheduled compaction job. Without compaction, any MOR pipeline will degrade in read performance over time. Monitoring the equality delete file count alongside query latency is a reliable signal for when compaction is overdue.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.