Highlights of RisingWave v1.8: The Open-Source Streaming Database

Highlights of RisingWave v1.8: The Open-Source Streaming Database

We at RisingWave are happy to announce the release of v1.8 this month! This release is again filled with new features, updates to existing features, and bug fixes. Additional improvements are made to Python UDFs, Rust UDFs, and connectors, providing users with more flexibility and control over their database. A breaking change regarding decoupled sinks is also included, so read on to learn how to avoid compatibility issues.

If you are interested in the full list of v1.8 updates, see the release note.

Breaking change: decoupling sinks

Starting from v1.8, RisingWave has changed to a more lightweight method for state cleaning, affecting sinks with decouple enabled. This was first implemented in v1.6, but sink decoupling compatibility will not be maintained going forward. It is recommended for users with decoupled sinks to either not directly upgrade to v1.8, or to delete all sinks with decoupled enabled before upgrading.

To check if you have any sinks with decoupled enabled, first upgrade to v1.7. In v1.7, we support the internal table rw_sink_decouple, which allows you to check the decouple status of all sinks. There are no compatilibility issues when upgrading from v1.6 to v1.7. Next, run the following query.

SELECT * FROM rw_sink_decouple
WHERE is_decouple AND watermark_vnode_count < 256;

The query will return any sinks that may run into compatibility issues if you upgrade. If the query returns empty results, it is safe to upgrade to v1.8.

Embedded UDFs

In v1.7, we introduced support for UDFs in additional languages. With this release, we continue to improve the functionality of UDFs. Now you can create embedded UDFs in Python and Rust, which means these UDFs are defined, compiled, and ran internally in RisingWave. This way, there is no need to install the external API. However, you are limited from using external libraries or file systems.

Embedded Python UDFs

When creating an embedded Python UDF, use the CREATE FUNCTION command. Here is an example.

CREATE FUNCTION gcd(a int, b int) RETURNS int LANGUAGE python AS $$
def gcd(a, b):
    while b != 0:
        a, b = b, a % b
    return a
$$;

The function definition is written using Python syntax. You also have the option to create table functions and to have your function return struct types.

Creating an embedded UDFs means that you are limited to pure computation logic but it is useful if you need to repeatedly use complex computations. However, you do have access to the json, decimal, re, math, and datetime libraries. After the function is created, you can call it like any other built-in function.

Some built-in functions are not allowed. For the full list, see the official documentation linked below.

Embedded Rust UDFs

Likewise with embedded Python UDFs, use the CREATE FUNCTION command to define an embedded UDF.

CREATE FUNCTION gcd(int, int) RETURNS int LANGUAGE rust AS $$
    fn gcd(mut x: i32, mut y: i32) -> i32 {
        while y != 0 {
            (x, y) = (y, x % y);
        }
        return x;
    }
$$;

The function body is defined using Rust syntax. And just like embedded Python UDFs, you can create table functions and have your function return struct types. While additional external libraries are not supported, you can still use the standard libraries chrono, rust_decimal, and serde_json. Once your function is defined, you can use it like any built-in function.

For details on the syntax, see the documentation linked below.

For more details, see:

Refresh schema

Previously, to refresh the schema of a source defined with a Schema Registry, you needed to redefine its schema registry using the ALTER SOURCE command, which was quite verbose. Furthermore, refreshing the schema of a table was not supported.

In this release, we introduce the REFRESH SCHEMA syntax, making it much easier to update the schema of the table or source in RisingWave. Note that the data FORMAT and ENCODE options cannot be changed when updating the schema. The syntax of refreshing the schema of a source is as follows.

ALTER SOURCE s REFRESH SCHEMA;

Now the schema of the source s1 will be updated with any changes made to the source schema.

Similarly, the syntax to refresh the schema of a table created with an external connector is as follows.

ALTER TABLE t REFRESH SCHEMA;

If upon refreshing the schema, certain columns are dropped but those columns are referenced by other downstream fragments, such as a materialized view, the command will not work.

For more details, see:

RANGE in window functions

When employing a window function in your SQL queries, the RANGE clause can be used to specify the range of rows relative to the current row that are included in the window frame. It operates on a range of values based on the ordering of rows. The RANGE can be specified as follows.

RANGE BETWEEN frame_start AND frame_end

Here, frame_start and frame_end describe which rows to perform the calculations on.

frame_start can be any one of UNBOUNDED PRECEDING, CURRENT ROW, a certain number of rows, or a certain time interval. For instance, the following clause includes all rows from 1 day before to the current row.

RANGE BETWEEN 1 DAY PRECEDING AND CURRENT ROW

frame_end can be UNBOUNDED FOLLOWING, CURRENT ROW, a certain number of rows, or a time interval. The following clause includes all rows after the current row.

RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

For more details, see:

Ruby client support

If you have an external Ruby application, you can now interact with RisingWave from it using any third-party PostgreSQL driver. Our official documentation goes over this process using the ruby-pg driver. The following lines of code in Ruby allows you to establish a connection with the RisingWave database, assuming default database credentials.

require 'pg'

conn = PG.connect(host: '127.0.0.1', port: 4566, dbname: 'dev', user: 'root')

Once connected, you can run SQL queries just like you would in RisingWave, or employ any of the built-in features that ruby-pg has to offer.

The ruby-pg driver ensures optimal performance and compatibility with PostgreSQL features and database operations, making it suitable if you are looking to build Ruby applications with demanding performance requirements.

For more details, see:

New source connectors

In addition to the numerous improvements made to existing source and sink connectors, we introduce two new source connectors with this release, providing you with more flexibility on how you want to build your stream processing pipeline.

Iceberg source

You can now batch-read data from an Iceberg source with the new Iceberg source connector. Like with all other source connectors in RisingWave, you can start ingesting data by using a CREATE SOURCE command. Note that the source table in Iceberg must be a COW table or cannot be deleted files.

CREATE SOURCE iceberg_source (
    id bigint,
    user_name varchar
) WITH (
    connector = 'iceberg',
    catalog.type = 'storage',
    warehouse.path = 's3a://hummock001/',
    s3.endpoint = '<http://127.0.0.1:9301>',
    s3.access.key = 'admin',
    s3.secret.key = 'admin',
    s3.region = 'us-east-1',
    database.name='db_name',
    table.name='table_name'
);

The supported catalog types are storage, jdbc, hive, and rest.

Furthermore, it is optional to specify the columns when creating an Iceberg source. All columns from the table are automatically derived in this case.

MongoDB CDC

Previously, ingesting CDC data from MongoDB into RisingWave required you to set up a pipeline that included a Debezium connector, for MongoDB to track database changes and record them in a Kafka topic, and a Kafka connector that you connect to in RisingWave. The new MongoDB CDC connector simplifies the process by allowing you to directly connect to MongoDB from RisingWave. All you need is a CREATE TABLE command to establish a direct connection with MongoDB.

CREATE TABLE mongocdc(
    _id varchar PRIMARY KEY,
    payload jsonb
) WITH (
    connector = 'mongodb_cdc',
    mongodb.url = 'mongodb://localhost:27017/?replicaSet=rs0',
    collection.name = 'dbname.*, foo.*'
);

You can choose to ingest data from collections from multiple databases in the [collection.name](<http://collection.name>) parameter, or from specific collections.

For more details, see:

SQL meta store

While etcd is still supported for backward compatibility, we have also introduced the technical preview of PostgreSQL, MySQL, and SQLite as new options for metadata storage.

PostgreSQL offers greater robustness when it comes to large volumes of metadata. For example, when creating over 1,000 materialized views and sinks, etcd is prone to be overwhelmed and lead to out-of-memory errors, whereas PostgreSQL is more resilient.

In a production environment, we recommend deploying an instance with 2 CPU cores and 4 GB memory at minimum for PostgreSQL, along with active replication for high availability.

For more details, see:

>

These are just some of the new features included with the release of RisingWave v1.8. To see the entire list of updates, which includes updates made data formats and system catalogs, please refer to the detailed release notes. > >

>

Look out for next month’s edition to see what new, exciting features will be added. Check out the RisingWave GitHub repository to stay up to date on the newest features and planned releases. > >

>

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and LinkedIn, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide. > >

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.