Debezium vs Kafka Connect JDBC Source: Which CDC Approach Is Right?

Debezium reads the database transaction log (WAL, binlog). The Kafka Connect JDBC Source Connector polls the database using a timestamp or incrementing column. These are fundamentally different architectures — and choosing the wrong one causes data loss, missed deletes, or unnecessary operational complexity.

The Core Technical Difference

JDBC Source works by periodically running a SQL query like:

SELECT * FROM orders WHERE updated_at > :last_polled_timestamp

It is a polling mechanism. The connector remembers the last timestamp it saw and advances it on each poll.

Debezium works by connecting to the database's replication protocol and reading the transaction log as a stream. It captures every row change as an event, in commit order, without polling.

This distinction drives everything else.

Comparison Table

Property	JDBC Source	Debezium
Mechanism	Polling (timestamp/ID-based)	Log-based (WAL/binlog)
Captures DELETE	No	Yes
Captures UPDATE without `updated_at`	No	Yes
Event ordering guaranteed	No	Yes (per partition)
Latency	Seconds to minutes	Sub-second
DB load	Periodic query scans	Minimal (log read)
Requires DB configuration	No	Yes (replication enabled)
Works on read replicas	Yes	Limited
Schema change detection	No	Yes

When JDBC Source Is "Good Enough"

JDBC Source is a reasonable choice in specific scenarios. Engineers often over-engineer by defaulting to Debezium when simpler polling would work.

Use JDBC Source when:

The table only has INSERTs (append-only). Examples: event logs, audit tables, IoT sensor readings.
You never need to track DELETEs. If a record is deleted in the source, you simply don't care.
Latency of 30-60 seconds is acceptable for your use case.
The database does not support logical replication (older MySQL without binlog, some managed cloud databases with replication disabled).
You want a simple setup with no database-side configuration changes required.

A typical JDBC Source configuration for an append-only table:

{
  "name": "orders-jdbc-source",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://db:5432/prod",
    "connection.user": "reader",
    "connection.password": "secret",
    "mode": "timestamp+incrementing",
    "timestamp.column.name": "created_at",
    "incrementing.column.name": "id",
    "table.whitelist": "public.sensor_readings",
    "poll.interval.ms": "10000",
    "topic.prefix": "jdbc."
  }
}

This works well. No WAL configuration. No replication slots. Minimal permissions.

When You Need Debezium

Debezium is required when any of the following are true:

You need to capture DELETEs. JDBC Source simply cannot detect that a row was removed. There is no timestamp to query against a deleted row.

You need strict ordering. JDBC Source polls independently and can deliver events out of order if rows are updated while a poll is in flight. Debezium delivers events in WAL commit order.

You need low latency. Polling every 10 seconds means 10 seconds of lag at minimum. Debezium typically delivers changes within 100-500ms.

Your rows don't have a reliable updated_at column. Many legacy schemas lack this. Some schemas have it but don't update it consistently. Debezium doesn't rely on it.

You need to capture all intermediate states. If a row is updated 5 times between polls, JDBC Source only sees the final state. Debezium captures all 5 updates.

A minimal Debezium PostgreSQL connector configuration:

{
  "name": "orders-debezium-source",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "secret",
    "database.dbname": "prod",
    "database.server.name": "prod",
    "plugin.name": "pgoutput",
    "table.include.list": "public.orders",
    "snapshot.mode": "initial"
  }
}

Note the difference in required DB setup: the debezium user needs REPLICATION privilege, and the database needs wal_level = logical.

-- PostgreSQL setup for Debezium
ALTER SYSTEM SET wal_level = logical;
CREATE USER debezium WITH REPLICATION LOGIN PASSWORD 'secret';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium;

The "Hidden Cost" of JDBC Source

JDBC Source polling generates table scans. For a table with 50M rows and an index on updated_at, each poll is an index scan — cheap. But as the table grows or the index degrades, poll queries get slower and start competing with application queries.

At high poll frequency (every 1-5 seconds), you can generate meaningful read load on the source database. This is the opposite of what log-based CDC does: reading WAL adds almost no load to the source because the data is already being written to disk anyway.

RisingWave: When You Want Neither

Both JDBC Source and Debezium require Kafka as the message bus. You need a Kafka cluster, Kafka Connect workers, topic management, consumer groups, schema registry, and offset management. For teams that want CDC without this infrastructure, RisingWave offers an alternative.

RisingWave uses the Debezium Embedded Engine directly — no Kafka required — and exposes CDC as a SQL source.

-- Direct CDC ingestion without Kafka
CREATE SOURCE orders_source WITH (
  connector = 'postgres-cdc',
  hostname = 'db',
  port = '5432',
  username = 'rwuser',
  password = 'secret',
  database.name = 'prod',
  schema.name = 'public',
  table.name = 'orders'
);

-- Downstream materialized view
CREATE MATERIALIZED VIEW recent_orders AS
SELECT
  id,
  customer_id,
  total_amount,
  status,
  updated_at
FROM orders_source
WHERE status != 'cancelled';

This gives you log-based CDC (captures DELETEs, strict ordering, sub-second latency) without deploying Kafka, Zookeeper, Kafka Connect workers, or managing consumer group offsets.

The tradeoff: if you already have Kafka as a company-wide message bus, adding Debezium + JDBC Source is cheap because the infrastructure already exists. RisingWave is the better choice when you're building a new pipeline and don't want to justify a Kafka deployment for a single CDC use case.

Decision Framework

Start with these questions:

Do you need DELETEs? If yes, eliminate JDBC Source.
Do you need sub-second latency? If yes, eliminate JDBC Source.
Do you already run Kafka? If yes, Debezium + Kafka Connect is the natural fit.
Are you building a new pipeline with no Kafka? Consider RisingWave.
Is the source an append-only table with updated_at? JDBC Source may be sufficient.

FAQ

Can I mix JDBC Source and Debezium for different tables? Yes. A common pattern is using Debezium for transactional tables (orders, customers) and JDBC Source for append-only reference tables (product catalog, config). They can both write to Kafka topics and downstream consumers don't need to know the difference.

Does JDBC Source work with PostgreSQL logical replication? No. JDBC Source is entirely independent of logical replication. It connects as a normal JDBC client and runs SELECT queries. Logical replication is only used by log-based CDC tools like Debezium.

What if my table has no updated_at and no unique incrementing ID? You are in a difficult position with JDBC Source. It cannot work reliably without one of these. Debezium (or RisingWave's CDC source) is required.

Is Debezium harder to operate than JDBC Source? Yes, meaningfully. Debezium requires wal_level = logical on the source, a replication slot, careful monitoring of slot lag, and handling schema evolution carefully. JDBC Source requires only a read-only database user.

Can RisingWave replace JDBC Source for append-only tables too? Yes. RisingWave has both a postgres-cdc connector (log-based) and a JDBC-style batch connector. For append-only tables where you want simplicity, you can use a batch load approach or use the CDC source with a filter that ignores DELETE events.