Debezium SQL Server Connector: Real-Time MSSQL CDC

Debezium SQL Server Connector: Real-Time MSSQL CDC

The Debezium SQL Server connector captures row-level changes from Microsoft SQL Server using its native Change Data Capture (CDC) feature and publishes them as events to Kafka. Downstream consumers like RisingWave can then process those events as a continuous, queryable stream in real time.

Why This Matters

SQL Server powers a large share of enterprise workloads—ERP systems, financial platforms, healthcare records—where the source database is off-limits for direct analytical queries. Replicating changes to a streaming layer lets you build dashboards, trigger alerts, and sync microservices without adding load to your production MSSQL instance.

Traditional ETL runs on a schedule. You get yesterday's data at best. CDC gives you sub-second propagation of inserts, updates, and deletes, which is the difference between reacting to a fraud signal in real time versus discovering it in a morning report.

How Debezium SQL Server CDC Works

SQL Server's built-in CDC feature writes change records to change tables in the same database. Each CDC-enabled table gets a corresponding change table (e.g., cdc.dbo_orders_CT) that records every insert, update, and delete as a row. Debezium polls these change tables using Log Sequence Numbers (LSN) to track its position in the stream.

Key concepts:

  • Capture instances: Each CDC-enabled table has one or more capture instances that define which columns to track.
  • LSN offsets: Debezium stores the last processed LSN so it can resume from an exact position after a restart.
  • Change tables: Automatically maintained by SQL Server Agent; Debezium reads from them rather than from the transaction log directly.

Step-by-Step Tutorial

Step 1: Enable CDC on SQL Server

First, enable CDC at the database level, then on individual tables:

-- Enable CDC on the database
USE MyDatabase;
EXEC sys.sp_cdc_enable_db;

-- Enable CDC on the orders table
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name   = N'orders',
    @role_name     = NULL,
    @supports_net_changes = 1;

-- Verify capture instance
SELECT * FROM cdc.change_tables;

SQL Server Agent must be running for CDC to work—it drives the log reader job that populates the change tables.

Step 2: Deploy the Debezium SQL Server Connector

Register the connector with Kafka Connect via the REST API:

{
  "name": "sqlserver-orders-connector",
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
    "database.hostname": "mssql-host",
    "database.port": "1433",
    "database.user": "debezium",
    "database.password": "secret",
    "database.dbname": "MyDatabase",
    "database.server.name": "dbserver1",
    "table.include.list": "dbo.orders",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.MyDatabase",
    "snapshot.mode": "initial",
    "snapshot.isolation.mode": "read_committed"
  }
}

Post this to Kafka Connect: curl -X POST http://kafka-connect:8083/connectors -H "Content-Type: application/json" -d @connector.json

The connector will perform an initial snapshot of existing rows, then switch to streaming mode, tailing the CDC change tables.

Step 3: Connect RisingWave to Kafka

Once Debezium is publishing events to Kafka, create a source in RisingWave using the Debezium format:

-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE orders_cdc (
    id          BIGINT,
    customer_id BIGINT,
    total       NUMERIC,
    status      VARCHAR,
    updated_at  TIMESTAMPTZ,
    _op         VARCHAR  -- debezium op field: c/u/d/r
) WITH (
    connector = 'kafka',
    topic = 'dbserver1.dbo.orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;

FORMAT DEBEZIUM ENCODE JSON tells RisingWave to parse the before/after envelope that Debezium produces, automatically handling upserts and deletes.

Step 4: Build Real-Time Materialized Views

Now query the stream as if it were a live table:

-- Real-time order revenue by status
CREATE MATERIALIZED VIEW order_revenue_by_status AS
SELECT
    status,
    COUNT(*)        AS order_count,
    SUM(total)      AS total_revenue,
    AVG(total)      AS avg_order_value
FROM orders_cdc
GROUP BY status;

-- Alert: large orders placed in last 5 minutes
CREATE MATERIALIZED VIEW large_recent_orders AS
SELECT id, customer_id, total, updated_at
FROM orders_cdc
WHERE total > 10000
  AND updated_at > NOW() - INTERVAL '5 minutes';

RisingWave incrementally updates these views as each CDC event arrives—no full scans needed.

Comparison Table

FeatureDebezium SQL ServerSSIS / Manual ETLSQL Server Replication
LatencySub-secondMinutes to hoursSeconds (but complex)
Deletes capturedYesTypically noYes
Open sourceYesNoNo (built-in)
Schema evolutionAutomaticManualManual
Downstream flexibilityAny Kafka consumerLimitedSQL Server only
Requires CDC enabledYesNoNo

FAQ

Q: Does the Debezium user need special permissions? Yes. The database user needs SELECT on all CDC change tables (cdc.*), plus VIEW DATABASE STATE to read LSN positions. For snapshot mode, it also needs SELECT on the source tables.

Q: What happens if SQL Server Agent stops? CDC change tables stop being populated. Debezium will stall at its last committed LSN and resume cleanly once SQL Agent restarts—provided the log retention window hasn't been exceeded. Monitor seconds_behind_source in Debezium's JMX metrics.

Q: Can I capture multiple SQL Server databases in one connector? Not in the standard connector—each connector instance targets a single database. Deploy separate connector configurations per database and use Kafka topic routing to segregate the streams.

Key Takeaways

  • SQL Server CDC must be enabled at both the database and table level before Debezium can capture changes.
  • The connector uses LSN offsets for exactly-once-style resumption after failures.
  • snapshot.isolation.mode = read_committed prevents long-running snapshot transactions from blocking production workloads.
  • RisingWave's FORMAT DEBEZIUM ENCODE JSON source type natively handles the Debezium envelope, including deletes.
  • For simpler setups without Kafka, RisingWave also offers a direct connector = 'sqlserver-cdc' option that connects to SQL Server without an intermediate broker.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.