When Debezium first connects to a database, it must decide how much existing data to capture before it starts streaming live changes. The wrong snapshot configuration can lock production tables, produce duplicate data, or miss rows entirely. Understanding the snapshot.mode options and snapshot.isolation.mode settings is essential before you deploy to production.
The Two Phases of a Debezium Connector
Every Debezium connector operates in two distinct phases:
- Snapshot phase: A one-time, point-in-time copy of existing rows, using a consistent read (or a series of reads depending on isolation mode).
- Streaming phase: Continuous capture of changes from the transaction log (WAL, redo log, CDC change tables), starting from the offset established at the end of the snapshot.
The transition between phases is critical. If there is a gap between the snapshot's high-water mark and the streaming start position, you lose changes. Debezium handles this internally, but your configuration choices determine whether the transition is clean.
Snapshot Modes Explained
initial (default)
Performs a full snapshot of all tables the connector is configured to capture, then switches to streaming. Use this when:
- The downstream system is empty and needs to be bootstrapped.
- You want a complete, current copy of every row before tracking changes.
Risk: On large tables, the snapshot can take hours and hold read locks (depending on snapshot.isolation.mode).
initial_only
Performs the snapshot and then stops—does not switch to streaming mode. Useful for one-time data migrations.
schema_only
Captures the schema of all tables but no data rows, then immediately starts streaming. Use this when:
- The downstream system already has the data (loaded via another mechanism).
- You only care about changes going forward.
Risk: Any rows that existed before the connector started are not in the downstream system unless they're subsequently updated or you load them separately.
never
Skips the snapshot entirely and starts streaming from the current log position. Use only when you have already performed a snapshot via another method (e.g., a pg_dump restore) and you know the exact log offset at which to start.
always
Performs a fresh snapshot every time the connector starts. Rarely appropriate for production—it creates massive data duplication and load on the source database. Only useful for small reference tables that should be fully reloaded on each restart.
when_needed
Performs a snapshot only when the connector detects that it cannot resume from its stored offset (e.g., the WAL has been purged). A sensible default for production deployments where you want automatic recovery from log retention gaps.
Snapshot Isolation Modes
The snapshot.isolation.mode setting controls the database-level isolation used during the snapshot read.
| Mode | Database Support | Behavior | Risk |
read_committed | All | Multiple reads, no consistent snapshot | Possible duplicate/missing rows on large tables |
repeatable_read | MySQL, PostgreSQL | Consistent snapshot, row-level locks | Lock contention on high-write tables |
serializable | All | Strongest consistency, table-level locks | High lock contention, not recommended for production |
read_uncommitted | MySQL only | Fastest, no locks | Dirty reads possible |
snapshot (SQL Server) | SQL Server only | Uses database snapshot feature | Requires snapshot isolation enabled on DB |
exclusive | Oracle | Table-level exclusive lock | Blocks all writes during snapshot |
For most production deployments, read_committed combined with a single-read strategy (small tables) or chunked reads (large tables with snapshot.max.rows tuning) is the right choice.
Step-by-Step Tutorial
Step 1: Choose the Right Snapshot Mode for Your Scenario
{
"name": "postgres-orders-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium",
"database.password": "secret",
"database.dbname": "shop",
"database.server.name": "pgserver1",
"table.include.list": "public.orders,public.customers",
"plugin.name": "pgoutput",
"snapshot.mode": "initial",
"snapshot.isolation.mode": "read_committed",
"snapshot.fetch.size": "10240",
"snapshot.max.threads": "1"
}
}
For large tables (>100M rows), consider setting snapshot.fetch.size higher (32768 or more) to reduce round trips, and snapshot.max.threads > 1 to parallelize the snapshot across tables.
Step 2: Connect RisingWave to Stream CDC Events
-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE orders_cdc (
id BIGINT,
customer_id BIGINT,
total NUMERIC,
status VARCHAR,
updated_at TIMESTAMPTZ,
_op VARCHAR -- debezium op field: c/u/d/r
) WITH (
connector = 'kafka',
topic = 'pgserver1.public.orders',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;
scan.startup.mode = 'earliest' ensures RisingWave reads from the start of the topic, including snapshot events. Snapshot rows are emitted as op: r (read) events in the Debezium envelope.
Step 3: Monitor Snapshot Progress
During a snapshot, Debezium exposes progress metrics via JMX:
-- If you're ingesting Debezium metrics into RisingWave:
CREATE MATERIALIZED VIEW snapshot_progress AS
SELECT
server_name,
remaining_table_count,
total_table_count,
ROUND(100.0 * (total_table_count - remaining_table_count) / NULLIF(total_table_count, 0), 1) AS pct_complete
FROM debezium_snapshot_metrics;
Step 4: Validate Snapshot Completeness
After the snapshot completes, verify row counts match between source and destination:
-- RisingWave: count snapshot rows (op = 'r')
SELECT COUNT(*) AS snapshot_rows
FROM orders_cdc
WHERE _op = 'r';
-- Compare against source PostgreSQL count
-- (run this on PostgreSQL directly)
-- SELECT COUNT(*) FROM orders;
Discrepancies suggest a snapshot interruption or isolation issue. In that case, reset the connector offsets and retrigger the snapshot.
Comparison Table
| Snapshot Mode | Use Case | Data Completeness | Load on Source | Recovery Safe |
initial | First-time bootstrap | Full history + changes | Medium (one-time) | Yes |
schema_only | Forward-only tracking | Changes only | Low | Yes |
never | Manual data load + known offset | Depends on prep | None | Only if prep is correct |
always | Small reference table reloads | Full on each restart | High | No (duplication) |
when_needed | Auto-recovery from log gaps | Full on gap detection | Medium | Yes |
initial_only | One-time data migration | Full, no changes after | Medium | N/A |
FAQ
Q: What happens if a snapshot is interrupted midway?
Debezium does not perform a partial snapshot. If interrupted, it will restart the snapshot from the beginning on next connector start (for initial mode). There is no partial-snapshot resume in most connectors, though the PostgreSQL connector has an incremental snapshot feature that allows resumable, chunk-based snapshots.
Q: Can I trigger a re-snapshot of specific tables without restarting the full connector?
Yes, with the incremental snapshot feature (available in PostgreSQL, MySQL, and some other connectors). Send a signal record to the Debezium signals table: INSERT INTO debezium_signal (id, type, data) VALUES ('unique-id', 'execute-snapshot', '{"data-collections":["public.orders"]}');
Q: Does schema_only snapshot protect against schema evolution issues?
Yes—even in schema_only mode, Debezium captures the schema history, which is needed to correctly interpret future change events whose structure might have changed over time. The schema history Kafka topic is always populated regardless of snapshot mode.
Key Takeaways
- Use
initialfor bootstrapping new downstream systems; useschema_onlywhen the data already exists downstream. read_committedisolation is usually sufficient for production snapshots and avoids lock contention.- Monitor
RemainingTableCountvia JMX during long snapshots to estimate completion time. - Snapshot rows arrive as
op: revents in the Debezium envelope; RisingWave'sFORMAT DEBEZIUMtreats them as inserts. - For very large tables, prefer the incremental snapshot feature over a full initial snapshot to minimize source database impact.

