SQL Server CDC: Debezium vs RisingWave

SQL Server CDC: Debezium vs RisingWave

For SQL Server CDC, both Debezium and RisingWave use SQL Server's native Change Data Capture feature. The difference is what happens next: Debezium routes changes through Kafka, while RisingWave ingests them directly into materialized views. If your goal is stream processing and analytics, RisingWave eliminates the Kafka layer entirely.

Enabling CDC on SQL Server

Before either tool can work, SQL Server needs CDC enabled at both the database and table level. This is a one-time DBA task.

-- Enable CDC on the database
USE MyDatabase;
EXEC sys.sp_cdc_enable_db;

-- Enable CDC on a specific table
EXEC sys.sp_cdc_enable_table
  @source_schema = N'dbo',
  @source_name   = N'orders',
  @role_name     = NULL,
  @supports_net_changes = 1;

SQL Server CDC works by reading the transaction log. The SQL Server Agent must be running — it drives the CDC capture and cleanup jobs. Verify with:

SELECT is_cdc_enabled FROM sys.databases WHERE name = 'MyDatabase';

Debezium SQL Server Connector Setup

Debezium's SQL Server connector runs inside Kafka Connect. You deploy a connector JSON config and Kafka Connect handles the rest.

Required infrastructure: SQL Server, Kafka, Kafka Connect (plus ZooKeeper or KRaft).

{
  "name": "sqlserver-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
    "database.hostname": "sqlserver.internal",
    "database.port": "1433",
    "database.user": "debezium_user",
    "database.password": "secret",
    "database.names": "MyDatabase",
    "table.include.list": "dbo.orders,dbo.customers",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-history.sqlserver",
    "topic.prefix": "sqlserver",
    "snapshot.mode": "initial"
  }
}

The database.history.kafka.topic is required — Debezium stores schema history in Kafka to handle DDL changes across restarts. This means Kafka is not optional; it is a hard dependency.

Once running, changes land in Kafka topics like sqlserver.MyDatabase.dbo.orders. Downstream consumers subscribe to those topics.

RisingWave SQL Server CDC Source

RisingWave embeds the Debezium engine and connects directly to SQL Server. No Kafka required.

CREATE SOURCE sqlserver_cdc
WITH (
  connector = 'sqlserver-cdc',
  hostname = 'sqlserver.internal',
  port = '1433',
  username = 'debezium_user',
  password = 'secret',
  database.name = 'MyDatabase',
  schema.name = 'dbo'
);

CREATE TABLE orders (
  order_id    INT,
  customer_id INT,
  amount      DECIMAL(10,2),
  status      VARCHAR,
  created_at  TIMESTAMP,
  PRIMARY KEY (order_id)
) FROM sqlserver_cdc TABLE 'dbo.orders';

From there, you write SQL against the CDC table directly:

CREATE MATERIALIZED VIEW revenue_by_customer AS
SELECT
  customer_id,
  SUM(amount) AS total_revenue,
  COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY customer_id;

Results update in real time as orders change in SQL Server.

Setup Complexity Comparison

FactorDebezium + KafkaRisingWave
Infrastructure componentsSQL Server, Kafka, Kafka Connect, ZooKeeper/KRaftSQL Server, RisingWave
SQL Server CDC enablementRequiredRequired
Schema history storageKafka topicManaged internally
Connector config formatJSON over REST APISQL DDL
Processing languageKafka Streams / ksqlDB / custom consumerStandard SQL
Snapshot supportYesYes
DDL change handlingSchema history topicAutomatic

Latency Characteristics

Both tools read from SQL Server's CDC tables, which are populated by the SQL Server Agent capture job. This introduces a small lag — typically under one second — between a committed transaction and its availability in CDC tables.

Debezium polls these CDC tables and publishes to Kafka. A consumer reading from Kafka adds another hop. End-to-end latency is typically 1–3 seconds depending on Kafka consumer poll intervals and batch settings.

RisingWave reads CDC tables directly and updates materialized views. No intermediate queue. Latency is comparable to Debezium's capture time, often under one second for query results on materialized views.

Migration Scenario: Moving from Debezium to RisingWave

Consider a team running this pipeline:

SQL Server → Debezium → Kafka → Spark Streaming → Data Warehouse

The Spark jobs compute running revenue totals, customer activity summaries, and inventory levels. Every query result gets written back to S3 and then loaded into Redshift.

The Kafka cluster costs $800/month managed. Spark jobs run on EMR at $400/month. Total: $1,200/month for a pipeline that answers three business questions.

Migration approach:

Step 1: Deploy RisingWave (self-hosted on EKS, or RisingWave Cloud).

Step 2: Create CDC sources pointing at the same SQL Server tables.

CREATE TABLE inventory (
  product_id  INT,
  quantity    INT,
  warehouse   VARCHAR,
  updated_at  TIMESTAMP,
  PRIMARY KEY (product_id)
) FROM sqlserver_cdc TABLE 'dbo.inventory';

Step 3: Recreate Spark logic as materialized views.

CREATE MATERIALIZED VIEW inventory_summary AS
SELECT
  warehouse,
  SUM(quantity) AS total_units,
  COUNT(*) AS sku_count,
  MAX(updated_at) AS last_updated
FROM inventory
GROUP BY warehouse;

Step 4: Validate query results match Spark output over a 24-hour parallel run.

Step 5: Decommission Spark jobs and Kafka cluster.

The Debezium + Kafka infrastructure can be removed because RisingWave takes over both the ingestion and processing roles.

When Debezium Still Wins

RisingWave is the right choice when SQL Server is the source and analytics or reporting is the destination. But keep Debezium when:

  • Multiple independent teams need to subscribe to the same SQL Server change stream (fan-out).
  • Downstream consumers are microservices, not SQL queries.
  • You already operate Kafka for other workloads and the marginal cost of adding a SQL Server connector is low.

Debezium's Kafka-backed architecture makes it the better hub when many consumers need the same data in different formats.

FAQ

Does SQL Server CDC require Enterprise edition? No. SQL Server CDC is available in Standard, Developer, and Enterprise editions. It is not available in Express edition.

Does RisingWave handle SQL Server failover (Always On)? RisingWave connects to the SQL Server instance directly. For Always On availability groups, point the connection at the listener address. If a failover occurs, RisingWave will reconnect and resume from the last LSN.

What permissions does the CDC user need? At minimum: db_datareader on the CDC schema, VIEW DATABASE STATE, and SELECT on cdc.lsn_time_mapping. The Debezium documentation has the full permission list, which applies to RisingWave as well.

Can I capture DDL changes like column additions? Debezium tracks schema history in Kafka and handles column additions automatically. RisingWave handles schema evolution for additive changes (new nullable columns) but requires manual intervention for breaking DDL changes.

What happens if the CDC capture job falls behind? Both tools read from SQL Server's CDC tables. If the capture job lags, both tools lag together. Monitor the capture job latency with sys.dm_cdc_log_scan_sessions.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.