Change Data Capture With MySQL and RisingWave Cloud

Change Data Capture With MySQL and RisingWave Cloud

Change data capture (CDC) is a technique used in a database to track changes made in the database over time. Rather than replicating entire tables and databases, CDC captures the changes made, such as inserts, updates, and deletes, usually by monitoring the transaction log. In MySQL, the transaction log is known as the binlog. To learn more about CDC, you can check out What is CDC? Definition, Benefits, and Use Cases.

RisingWave simplifies the ingestion and processing of real-time events, including Change Data Capture (CDC) events from MySQL. It offers a native MySQL connector, eliminating the need for additional platforms like Kafka and Debezium connectors. This streamlined approach makes the entire pipeline easier to manage.

RisingWave processes data incrementally as soon as it arrives, ensuring materialized views are reliable and consistent with the source data. This model improves efficiency and reduces latency, making RisingWave a suitable platform for processing real-time CDC events from MySQL.

In this blog, we'll walk through setting up MySQL and RisingWave to create a streaming pipeline where RisingWave ingests and processes CDC events from MySQL in real-time.

Set up the pipeline to ingest CDC data

Now we will outline the configurations necessary to ingest CDC data from MySQL into RisingWave. For demonstration purposes, we will refer to the self-hosted version of MySQL but AWS RDS MySQL and Amazon Aurora MySQL are also supported. Their configurations will be slightly different.

MySQL configurations

To get started, there are a few configurations we need to make in MySQL.

Step 1: Create a user

First, create a MySQL user with limited privileges. We will connect to MySQL from RisingWave with this user. If multiple users are connecting to the MySQL database, this limits their control over the MySQL database.

We recommend granting the following privileges.

GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
ON db_name
TO 'username'@'%';

Step 2: Enable binlog

To ensure that RisingWave can capture the database changes, the binlog must be enabled. Check if the log-bin system variable is set to ON in the server configuration file my.cnf.

Use RisingWave Cloud to ingest data

Now that the MySQL server is configured properly, let us see how to create a source in RisingWave to capture these CDC events.

Step 1: Deploy RisingWave Cloud

See the Quick start guide to learn how to quickly and easily get started with RisingWave Cloud. Follow the steps to create an account, create a cluster, and connect to a cluster.

Step 2: Create a data source

Once we have connected to a RisingWave cluster, we can connect to the MySQL database by using the CREATE SOURCE command in the Query console.

For instance, the following query establishes a connection with the mydb MySQL database and provides us access to all tables within the database.

CREATE SOURCE mysql_mydb WITH (
  connector = 'mysql-cdc',
  hostname = '127.0.0.1',
  port = '3306',
  username = 'root',
  password = '123456',
  database.name = 'mydb',
  server.id = 5888
);

With this source created, we can create separate tables in RisingWave that correspond to the MySQL tables within mybd. If there is a table named customers under the mydb database, we can use the following SQL query to read CDC data from the table. Note that when creating a table, a primary key must be defined and correspond to the primary key in the MySQL table.

CREATE TABLE rw_customers (
    id int,
    customer_name varchar,
    gender varchar,
    PRIMARY KEY(id)
) FROM mysql_mydb TABLE 'mydb.customers';

We can repeat this process for each table we want to read data from. However, if we want to ingest data from a different MySQL database, we need to use the CREATE SOURCE query again, specifying the database credentials.

For more details on how to use the CREATE SOURCE and CREATE TABLE commands in RisingWave to ingest CDC data from MySQL, see Ingest CDC data from MySQL.

Image

Once these connections are established, we can use RisingWave for real-time analytics. If there is data from another source, we can easily create a materialized view in RisingWave that joins and transforms the tables to gain additional insights. As new data is ingested, RisingWave will ensure that the materialized views remain updated with the freshest data.

Moreover, you have the option to stream the transformed data back to your operational database, such as MySQL, to cater to real-time data requests. This approach shifts the responsibility of real-time data transformation from your operational database to RisingWave, ensuring the responsiveness of your operational database remains intact.

>

>

>

>

In this blog, we demonstrated how RisingWave simplifies ingesting CDC data from MySQL for real-time analytics. RisingWave offers a wide range of features for transforming and processing data to gather valuable insights. With its diverse array of connectors, you can ingest and process data from multiple platforms before sending it to downstream systems. > >

>

If you're interested in learning more about RisingWave's features, check out the official documentation site. You can also sign up for RisingWave Cloud for free to test it out easily. If you need assistance or want to connect with other stream processing enthusiasts, consider joining our active Slack community. > >

>

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.