MySQL CDC: Debezium vs RisingWave Native Connector
MySQL CDC works by reading the binary log (binlog), which records every committed transaction. Both Debezium and RisingWave's built-in MySQL connector read the same binlog — with one key difference: RisingWave embeds the Debezium Embedded Engine, so it handles binlog parsing, snapshot consistency, and schema evolution using the same proven code as Debezium, without requiring Kafka or Kafka Connect.
MySQL Binlog Basics
MySQL's binary log is the change data source for CDC. Before connecting any CDC tool, the source MySQL instance needs binlog enabled with row-level format:
# my.cnf or my.ini
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 7
binlog_format = ROW ensures that every changed row is recorded in full. binlog_row_image = FULL ensures before and after images of updated rows are both captured — required for reliable CDC.
After changing these settings, restart MySQL.
Grant the CDC user the necessary privileges:
CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'secret';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cdc_user'@'%';
FLUSH PRIVILEGES;
These prerequisites are identical whether you use Debezium Standalone or RisingWave's CDC connector.
Debezium MySQL Connector: Setup and Configuration
Debezium's MySQL connector runs as a Kafka Connect plugin. After deploying Kafka and Kafka Connect, register the connector:
{
"name": "mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql.internal",
"database.port": "3306",
"database.user": "cdc_user",
"database.password": "secret",
"database.server.id": "184054",
"topic.prefix": "shop_mysql",
"database.include.list": "shop",
"table.include.list": "shop.orders,shop.customers,shop.products",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-changes.shop",
"include.schema.changes": "true",
"snapshot.mode": "initial"
}
}
Note the schema.history.internal settings. MySQL's binlog does not include full DDL context with every event, so Debezium maintains a schema history in a separate Kafka topic. This is unique to the MySQL connector and does not apply to the PostgreSQL connector.
Once running, change events appear in topics named shop_mysql.shop.orders, shop_mysql.shop.customers, etc.
RisingWave MySQL CDC Connector: Setup and Configuration
RisingWave's MySQL connector uses the Debezium Embedded Engine internally. The binlog reading, snapshot logic, and schema evolution handling are the same Debezium code — the difference is the output goes into RisingWave's SQL engine rather than Kafka.
CREATE SOURCE mysql_shop WITH (
connector = 'mysql-cdc',
hostname = 'mysql.internal',
port = '3306',
username = 'cdc_user',
password = 'secret',
database.name = 'shop',
server.id = '5401'
);
The server.id must be unique across all MySQL replication clients connecting to this instance — the same requirement that applies to Debezium Standalone.
Create tables from the source:
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT,
product_id BIGINT,
quantity INT,
total_amt DECIMAL(10, 2),
status VARCHAR,
created_at DATETIME
) FROM mysql_shop TABLE 'shop.orders';
CREATE TABLE customers (
id BIGINT PRIMARY KEY,
email VARCHAR,
region VARCHAR,
tier VARCHAR
) FROM mysql_shop TABLE 'shop.customers';
Then write SQL on top:
CREATE MATERIALIZED VIEW orders_per_customer AS
SELECT
c.id AS customer_id,
c.email,
c.tier,
COUNT(o.id) AS order_count,
SUM(o.total_amt) AS lifetime_value,
MAX(o.created_at) AS last_order_at
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY c.id, c.email, c.tier;
AWS RDS MySQL
RDS MySQL is one of the most common production environments for MySQL CDC. Both Debezium and RisingWave support it with minor configuration adjustments.
Enabling Binlog on RDS MySQL
Binlog is controlled via a DB parameter group on RDS. Create or modify a parameter group:
binlog_format = ROW
binlog_row_image = full
log_bin = (set automatically by RDS when retention > 0)
Enable binlog retention via the RDS procedure:
-- Run on the RDS instance
CALL mysql.rds_set_configuration('binlog retention hours', 24);
Without this, RDS purges binlog files aggressively, which can cause CDC connectors to lose their position if they fall behind.
RisingWave CREATE SOURCE for RDS MySQL
CREATE SOURCE rds_mysql_shop WITH (
connector = 'mysql-cdc',
hostname = 'shop-db.abc123xyz.us-east-1.rds.amazonaws.com',
port = '3306',
username = 'cdc_user',
password = 'secret',
database.name = 'shop',
server.id = '5401',
ssl.mode = 'required'
);
The ssl.mode = 'required' is recommended for RDS connections. RDS supports TLS by default.
Debezium Connector for RDS MySQL
{
"name": "rds-mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "shop-db.abc123xyz.us-east-1.rds.amazonaws.com",
"database.port": "3306",
"database.user": "cdc_user",
"database.password": "secret",
"database.server.id": "184054",
"topic.prefix": "rds_shop",
"database.include.list": "shop",
"database.ssl.mode": "required",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-changes.rds-shop"
}
}
Latency Comparison
Both tools read from the MySQL binlog. The theoretical latency floor is similar — the time between a transaction committing in MySQL and the CDC tool reading the binlog event, which is typically under 100 ms in a healthy system.
The practical end-to-end latency differs based on pipeline depth:
| Stage | Debezium + Kafka | RisingWave (built-in CDC) |
| MySQL binlog → connector | ~10–50 ms | ~10–50 ms |
| Connector → Kafka (producer ack) | ~5–50 ms | N/A |
| Kafka → consumer (poll interval) | ~0–100 ms | N/A |
| Consumer → result available | Depends on consumer | ~10–50 ms (incremental MV) |
| Typical end-to-end | 50–300 ms | 20–100 ms |
The RisingWave path removes the broker round-trips. For real-time dashboards and low-latency analytics, this difference is noticeable.
Operational Complexity Comparison
| Task | Debezium + Kafka | RisingWave |
| Initial setup | Deploy Kafka, Connect, register connector | CREATE SOURCE SQL |
| Monitor connector health | Kafka Connect REST API, JMX metrics | RisingWave system catalog |
| Handle connector failure | Restart task via Kafka Connect API | Automatic restart by RisingWave |
| Schema changes in source DB | Schema history topic updated automatically | RisingWave handles via Debezium Embedded Engine |
| Scale consumption | Add Kafka consumer group members | Scale RisingWave compute nodes |
| Configuration changes | Update connector config via REST, restart | Modify CREATE SOURCE and reload |
Which to Choose for MySQL CDC
Choose Debezium Standalone when:
- Multiple downstream systems consume MySQL changes (data warehouse, search index, microservices).
- You already operate Kafka and the marginal cost of adding a connector is low.
- Long-term event retention and replay are required.
- Downstream consumers are non-SQL or non-RisingWave systems.
Choose RisingWave when:
- Your destination is analytics, materialized views, or real-time SQL queries.
- You want to join MySQL CDC with other data streams using SQL.
- Kafka is not in your infrastructure and you don't want to add it.
- Operational simplicity is a priority —
CREATE SOURCEreplaces an entire Kafka + Connect + consumer stack.
FAQ
Does RisingWave's MySQL connector support all MySQL versions? RisingWave's MySQL CDC connector, via the Debezium Embedded Engine, supports MySQL 5.7 and MySQL 8.0+. It also supports MariaDB and Percona Server, which share MySQL's binlog format.
Does the schema history work the same way in RisingWave as in Debezium? Debezium Standalone stores schema history in a Kafka topic. RisingWave stores schema history as part of its internal catalog, backed by object storage (S3/GCS). The purpose is the same: track DDL changes so that binlog events can be correctly decoded against the schema at the time the event was written.
What happens if a MySQL DDL change (ALTER TABLE) occurs on the source? RisingWave processes DDL change events from the Debezium Embedded Engine. For supported DDL changes (adding nullable columns, changing column names), the schema is updated automatically. For breaking changes (dropping columns that RisingWave uses), manual intervention may be required.
Can I use RisingWave CDC with AWS Aurora MySQL?
Yes. Aurora MySQL is binlog-compatible. Enable binlog on the Aurora cluster parameter group (binlog_format = ROW) and use the same CREATE SOURCE syntax as RDS MySQL. Aurora's multi-AZ writer endpoint is the correct hostname to use.
How do I monitor CDC lag in RisingWave?
RisingWave exposes source metrics through its system catalog. Query rw_source_backfill_info and related system tables for backfill status. For ongoing lag, monitor the difference between the current MySQL binlog position and RisingWave's last committed offset. RisingWave also exports Prometheus metrics that can be visualized in Grafana.

