The Debezium SQL Server connector captures row-level changes from Microsoft SQL Server using its native Change Data Capture (CDC) feature and publishes them as events to Kafka. Downstream consumers like RisingWave can then process those events as a continuous, queryable stream in real time.
Why This Matters
SQL Server powers a large share of enterprise workloads—ERP systems, financial platforms, healthcare records—where the source database is off-limits for direct analytical queries. Replicating changes to a streaming layer lets you build dashboards, trigger alerts, and sync microservices without adding load to your production MSSQL instance.
Traditional ETL runs on a schedule. You get yesterday's data at best. CDC gives you sub-second propagation of inserts, updates, and deletes, which is the difference between reacting to a fraud signal in real time versus discovering it in a morning report.
How Debezium SQL Server CDC Works
SQL Server's built-in CDC feature writes change records to change tables in the same database. Each CDC-enabled table gets a corresponding change table (e.g., cdc.dbo_orders_CT) that records every insert, update, and delete as a row. Debezium polls these change tables using Log Sequence Numbers (LSN) to track its position in the stream.
Key concepts:
- Capture instances: Each CDC-enabled table has one or more capture instances that define which columns to track.
- LSN offsets: Debezium stores the last processed LSN so it can resume from an exact position after a restart.
- Change tables: Automatically maintained by SQL Server Agent; Debezium reads from them rather than from the transaction log directly.
Step-by-Step Tutorial
Step 1: Enable CDC on SQL Server
First, enable CDC at the database level, then on individual tables:
-- Enable CDC on the database
USE MyDatabase;
EXEC sys.sp_cdc_enable_db;
-- Enable CDC on the orders table
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'orders',
@role_name = NULL,
@supports_net_changes = 1;
-- Verify capture instance
SELECT * FROM cdc.change_tables;
SQL Server Agent must be running for CDC to work—it drives the log reader job that populates the change tables.
Step 2: Deploy the Debezium SQL Server Connector
Register the connector with Kafka Connect via the REST API:
{
"name": "sqlserver-orders-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "mssql-host",
"database.port": "1433",
"database.user": "debezium",
"database.password": "secret",
"database.dbname": "MyDatabase",
"database.server.name": "dbserver1",
"table.include.list": "dbo.orders",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.MyDatabase",
"snapshot.mode": "initial",
"snapshot.isolation.mode": "read_committed"
}
}
Post this to Kafka Connect: curl -X POST http://kafka-connect:8083/connectors -H "Content-Type: application/json" -d @connector.json
The connector will perform an initial snapshot of existing rows, then switch to streaming mode, tailing the CDC change tables.
Step 3: Connect RisingWave to Kafka
Once Debezium is publishing events to Kafka, create a source in RisingWave using the Debezium format:
-- For Debezium → Kafka → RisingWave pipeline:
CREATE SOURCE orders_cdc (
id BIGINT,
customer_id BIGINT,
total NUMERIC,
status VARCHAR,
updated_at TIMESTAMPTZ,
_op VARCHAR -- debezium op field: c/u/d/r
) WITH (
connector = 'kafka',
topic = 'dbserver1.dbo.orders',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;
FORMAT DEBEZIUM ENCODE JSON tells RisingWave to parse the before/after envelope that Debezium produces, automatically handling upserts and deletes.
Step 4: Build Real-Time Materialized Views
Now query the stream as if it were a live table:
-- Real-time order revenue by status
CREATE MATERIALIZED VIEW order_revenue_by_status AS
SELECT
status,
COUNT(*) AS order_count,
SUM(total) AS total_revenue,
AVG(total) AS avg_order_value
FROM orders_cdc
GROUP BY status;
-- Alert: large orders placed in last 5 minutes
CREATE MATERIALIZED VIEW large_recent_orders AS
SELECT id, customer_id, total, updated_at
FROM orders_cdc
WHERE total > 10000
AND updated_at > NOW() - INTERVAL '5 minutes';
RisingWave incrementally updates these views as each CDC event arrives—no full scans needed.
Comparison Table
| Feature | Debezium SQL Server | SSIS / Manual ETL | SQL Server Replication |
| Latency | Sub-second | Minutes to hours | Seconds (but complex) |
| Deletes captured | Yes | Typically no | Yes |
| Open source | Yes | No | No (built-in) |
| Schema evolution | Automatic | Manual | Manual |
| Downstream flexibility | Any Kafka consumer | Limited | SQL Server only |
| Requires CDC enabled | Yes | No | No |
FAQ
Q: Does the Debezium user need special permissions?
Yes. The database user needs SELECT on all CDC change tables (cdc.*), plus VIEW DATABASE STATE to read LSN positions. For snapshot mode, it also needs SELECT on the source tables.
Q: What happens if SQL Server Agent stops?
CDC change tables stop being populated. Debezium will stall at its last committed LSN and resume cleanly once SQL Agent restarts—provided the log retention window hasn't been exceeded. Monitor seconds_behind_source in Debezium's JMX metrics.
Q: Can I capture multiple SQL Server databases in one connector? Not in the standard connector—each connector instance targets a single database. Deploy separate connector configurations per database and use Kafka topic routing to segregate the streams.
Key Takeaways
- SQL Server CDC must be enabled at both the database and table level before Debezium can capture changes.
- The connector uses LSN offsets for exactly-once-style resumption after failures.
snapshot.isolation.mode = read_committedprevents long-running snapshot transactions from blocking production workloads.- RisingWave's
FORMAT DEBEZIUM ENCODE JSONsource type natively handles the Debezium envelope, including deletes. - For simpler setups without Kafka, RisingWave also offers a direct
connector = 'sqlserver-cdc'option that connects to SQL Server without an intermediate broker.

