Debezium's MySQL connector reads from MySQL's binary log (binlog) to capture every insert, update, and delete in real time. Paired with Kafka and RisingWave — a PostgreSQL-compatible streaming database — you can build low-latency data pipelines that react to MySQL changes the moment they are committed.
Why CDC for MySQL?
MySQL is one of the most widely deployed relational databases in the world, powering everything from e-commerce platforms to SaaS applications. However, MySQL is optimized for OLTP, not analytics. Running analytical queries directly against a production MySQL instance risks degrading application performance.
The MySQL binlog offers a non-intrusive way to observe changes: it records every committed transaction for replication purposes, and Debezium reads it just like a replica would — without touching the application tables with additional queries.
Use cases that benefit from MySQL CDC:
- Order processing: Stream new orders to fulfillment and analytics systems simultaneously
- Inventory management: Propagate stock-level changes to downstream services in real time
- User activity tracking: Capture user record changes for personalization pipelines
- Multi-database synchronization: Keep MySQL and secondary stores (Redis, Elasticsearch, PostgreSQL) consistent
How the Debezium MySQL Connector Works
MySQL's binlog records every committed change to the database. Debezium connects to MySQL as a replica, requesting a binlog stream starting from a specific position (log file + offset).
The connector performs two phases:
- Snapshot: Reads the current state of all monitored tables and emits
op=r(read) events for each row - Streaming: Tails the binlog continuously, emitting
op=c(create),op=u(update), andop=d(delete) events as they occur
Each Debezium MySQL event contains before (for updates and deletes), after (for inserts and updates), op, and a source block with ts_ms, file (binlog filename), pos (binlog position), gtid (if GTIDs are enabled), and server_id.
Step-by-Step Tutorial
Step 1: Prepare MySQL
Enable binary logging in my.cnf:
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
expire_logs_days = 7
Create a Debezium user with the required privileges:
CREATE USER 'debezium'@'%' IDENTIFIED BY 'secret';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'debezium'@'%';
FLUSH PRIVILEGES;
Step 2: Deploy the Debezium MySQL Connector
Register the connector with Kafka Connect:
{
"name": "mysql-orders-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql.internal",
"database.port": "3306",
"database.user": "debezium",
"database.password": "secret",
"database.server.id": "184054",
"topic.prefix": "mysql1",
"database.include.list": "shop",
"table.include.list": "shop.orders,shop.products",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-changes.mysql1",
"include.schema.changes": "true"
}
}
curl -X POST http://kafka-connect:8083/connectors \
-H "Content-Type: application/json" \
-d @mysql-connector.json
Step 3: Connect RisingWave to the Kafka Topic
RisingWave is a PostgreSQL-compatible streaming database that understands the Debezium event format natively:
CREATE SOURCE orders_mysql
WITH (
connector = 'kafka',
topic = 'mysql1.shop.orders',
properties.bootstrap.server = 'kafka:9092',
scan.startup.mode = 'earliest'
) FORMAT DEBEZIUM ENCODE JSON;
The FORMAT DEBEZIUM ENCODE JSON clause tells RisingWave to apply the Debezium upsert/delete semantics, so the source behaves like a live table reflecting the current state of shop.orders.
Step 4: Build a Real-Time Pipeline with Materialized Views
Compute a running leaderboard of top products by revenue:
CREATE MATERIALIZED VIEW product_revenue AS
SELECT
product_id,
SUM(quantity * unit_price) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_buyers,
MAX(order_time) AS last_sale_at
FROM orders_mysql
WHERE status != 'cancelled'
GROUP BY product_id;
Join with a product dimension source for enriched output:
CREATE MATERIALIZED VIEW enriched_product_revenue AS
SELECT
pr.product_id,
p.name AS product_name,
p.category,
pr.total_revenue,
pr.unique_buyers
FROM product_revenue pr
JOIN products_mysql p ON pr.product_id = p.id;
Step 5: Sink Results Downstream
Write the enriched view to a downstream Kafka topic or database:
CREATE SINK revenue_sink
FROM enriched_product_revenue
WITH (
connector = 'kafka',
topic = 'product-revenue-live',
properties.bootstrap.server = 'kafka:9092'
) FORMAT UPSERT ENCODE JSON
KEY ENCODE JSON (product_id);
Comparison Table
| Feature | Debezium MySQL CDC | Direct Polling |
| Captures deletes | Yes (op=d) | No (requires soft deletes) |
| Latency | Milliseconds | Seconds to minutes |
| Impact on MySQL | Minimal (reads binlog) | Query load on tables |
| Schema change tracking | Yes (schema history topic) | Manual |
| Initial snapshot | Automatic | Manual |
| GTID support | Yes | N/A |
FAQ
Do I need to use GTIDs with Debezium MySQL? GTIDs (Global Transaction Identifiers) are recommended but not required. With GTIDs enabled, Debezium can more reliably handle MySQL failovers, as it can resume from the correct position on a new primary without manual intervention.
What binlog_row_image setting is required?
You must set binlog_row_image = FULL. This ensures that MySQL writes the complete before-image and after-image of each changed row to the binlog. With MINIMAL or NOBLOB, Debezium cannot reconstruct the full before state.
How does RisingWave's native MySQL CDC differ from Debezium?
RisingWave includes a built-in connector = 'mysql-cdc' that connects directly to MySQL without needing Kafka or Debezium. It's simpler to set up and suitable when RisingWave is the only consumer of the change stream. Debezium + Kafka is better when multiple systems need to consume the same change events.
Key Takeaways
- Debezium reads MySQL binlog as a replica, capturing all committed changes with no application-level impact
- Set
binlog_format = ROWandbinlog_row_image = FULLbefore deploying the connector - RisingWave ingests Debezium MySQL events using
FORMAT DEBEZIUM ENCODE JSONon a Kafka source - Materialized views update incrementally with each CDC event, enabling real-time aggregations
- For simpler pipelines, RisingWave's native
mysql-cdcconnector eliminates the Kafka dependency

