Highlights of RisingWave v2.0: The Open-Source Streaming  SQL Database

Highlights of RisingWave v2.0: The Open-Source Streaming SQL Database

From v1.10 to v2.0, we have launched new SQL commands and clauses, allowing you to securely store external database credentials and perform time travel queries. New sink connectors were added and significant improvements were made to existing ones. RisingWave now also supports ingesting and delivering data in Parquet format.

If you are interested in the full list of v2.0 updates, see our comprehensive release note.

v2.0 feature highlights

With each new version, the team at RisingWave continues to deliver new and improved features to improve the usability and flexibility of RisingWave. Let us go over some of the highlight updates rolled out in v2.0.

Enhancements to subscriptions and cursors

v2.0 made breaking changes to subscriptions and cursors, altering their default behavior and output.

DECLARE cursor_name SUBSCRIPTION CURSOR FOR subscription_name will no longer include historical data. Instead, it will only return incremental data starting from the time of declaration. To also include historical data, add SINCE begin() at the end of the DECLARE statement.

Next, the output operations after fetching data from a cursor have been updated from int to varchar for improved readability. Instead of seeing 1, 2, 3, or 4, the output will now be INSERT, UPDATE_INSERT, DELETE, or UPDATE_DELETE, respectively.

Finally, the SQL commands SHOW CURSORS and SHOW SUBSCRIPTION CURSORS are available. They will return all cursors and subscriptions that were created in the current session.

For more details, see:

RisingWave Python SDK

In addition to using sqlalchemy, ibis, psycopg2, and other PostgreSQL-compatible Python tools to connect to RisingWave, we're excited to announce the public preview of our new Python SDK. Install this SDK by simply running pip install risingwave-py. This SDK is purpose-built to enable developers to easily interact with RisingWave, streamlining the deployment of event-driven applications. Start unlocking its potential in your projects.

For more details, see:

Time travel

Time travel queries are available as a Premium Edition feature.

Time travel queries are available now, allowing you to access historical data at specific points in time. This can be crucial for tracking data changes, auditing, and compliance. They also simplify error recovery as you can quickly restore data after accidental modifications.

In RisingWave, you can easily access historical data by adding the FOR SYSTEM_TIME AS OF clause after any SELECT query. Furthermore, the system parameter time_travel_retention_ms must be set to a non-zero value to enable time travel queries. This system parameter also indicates how long historical data should be stored before being deleted.

For instance, the following SQL query retrieves data from one day ago.

SELECT * FROM table_name FOR SYSTEM_TIME AS OF NOW() - '1' DAY;

You can also retrieve data from a specific date by using a timestamp in datetime or timestamp format.

For more details, see:

Secret management

Secret management is available as a Premium Edition feature.

To securely manage external database credentials (such as MySQL and PostgreSQL), you can create credentials by using the CREATE SECRET command. When connecting to an external database, as either a source or sink, the database credentials must be explicitly specified in the WITH options. However, this may pose a security risk in a large organization with multiple team members.

To resolve this issue, admins can create secrets that store the database credentials and allow others to reference the secret when creating a connection to an external sink. By doing so, fewer people have direct access to the database credentials.

For more details, see:

CORRESPONDING keyword for set operations

The CORRESPONDING keyword can now be used alongside set operations UNION, INTERSECT, and EXCEPT. By employing the CORRESPONDING keyword, columns will be automatically matched from each table. You can also specify which columns to match on by using CORRESPONDING BY col1_name[, col2_name, ...].

For instance, say we have the following two tables.

CREATE TABLE employees (
    employees_id int,
    first_name varchar,
    last_name varchar,
    last_update timestamp
);

CREATE TABLE customers (
    customer_id integer,
    store_id int,
    first_name varchar,
    last_name varchar,
    email varchar,
    last_update timestamp,
);

If we union the tables without using CORRESPONDING, the columns selected from each table must match. The following query will return first_name and last_name from employees and customers.

SELECT first_name, last_name
FROM employees
UNION ALL
SELECT first_name, last_name
FROM customers;

Now if CORRESPONDING is used, the columns from each table do not need to match. The following query will return first_name, last_name, and last_update from employees and customers.

SELECT *
FROM employees
UNION ALL CORRESPONDING
SELECT *
FROM customers;

This new keyword makes it easier to perform set operations by saving you the hassle of finding matching columns from each table.

For more details, see:

Compatibility with Parquet files

You can now ingest data from Parquet files and sink data in Parquet format from AWS S3 and Google Cloud Storage.

Parquet files can be batch-read from AWS S3 by using the function file_scan. Or, if creating a table or source from AWS S3 or Google Cloud Storage, the encoding format and the match pattern can be specified as parquet.

The syntax for the function file_scan is as follows. When using this function, the first Parquet file should include the schema. All Parquet files in the directory should have the same schema to avoid issues.

SELECT
  col1,
  col2,
  col3
FROM file_scan(
    file_format,
    storage_type,
    s3_region,
    s3_access_key,
    s3_secret_key,
    file_location);

Delivering data in Parquet format from RisingWave into file systems is also available. To do so, the match_pattern and file_type parameters can now be specified as *.parquet and parquet, respectively. The ENCODE option also can be specified as PARQUET.

This added support for Parquet files builds on RisingWave’s capability as a unified streaming and batch-processing system.

For more details, see:

Automatic table schema mapping and changes for MySQL

Automatic schema mapping and changes for MySQL sources are available as a Premium Edition feature.

On top of automatic schema mapping when ingesting CDC data from a MySQL table, RisingWave is capable of registering automatic schema changes. Whenever modifications are made to the source MySQL table’s schema, those changes will instantly be reflected in your RisingWave table as well. Previously, modifications made to a table schema could only be made manually by using the ALTER TABLE command to drop or add columns. Now, this process can be automated by setting the parameter auto.schema.change to true when creating a source.

CREATE SOURCE mysql_source WITH (
 connector = 'mysql-cdc',
 hostname = 'localhost',
 port = '3306',
 username = 'useanme',
 password = 'password',
 database.name = 'db_name',
 server.id = '5701',
 auto.schema.change = 'true'
);

For more details, see:

New sinks

RisingWave continues to expand its ecosystem by adding new connectors to downstream systems. You can directly sink data to MongoDB and Azure Blob. If you are interested in a particular connector, see our Integrations page. You can vote to show your interest in a specific connector, or ask to be notified when it becomes available.

MongoDB

MongoDB is a popular, open-source NoSQL database built for storing and managing large amounts of unstructured or semi-structured data. You can now directly sink data from RisingWave to MongoDB using the CREATE SINK command. Both append-only and upsert type sinks are available. For append-only type sinks, a primary key should not be defined, whereas a primary key must be defined for an upsert type sink.

CREATE SINK mongodb_sink FROM mv
WITH (
    connector='mongodb',
  type = 'upsert',
  mongodb.url = 'mongodb://mongodb:27017/?replicaSet=rs0',
  collection.name = 'collection1',
  primary_key='id'
);

Azure Blob

Azure Blob Storage is a cloud-based object storage solution designed to store and manage large amounts of unstructured data. You can now sink data from RisingWave to Azure Blob by using the CREATE SINK command.

CREATE SINK azure_sink FROM mv
WITH (
    connector = 'azblob',
    azblob.container_name = 'container_name',
    azblob.path = 'sink/file/location',
    azblob.credentials.account_name = 'account_name',
    azblob.credentials.account_key = 'account_key',
    azblob.endpoint_url = 'url',
    type = 'append-only,
    force_append_only = 'true'
);

For more details, see:

Updates since v1.0

Now that we have introduced the new features included in v2.0, let us review some of the notable upgrades and revamps since v1.0.

Expansion of external connectors

As a distributed SQL database, we recognize how salient it is to have an expansive, flexible ecosystem. The ability to easily connect to any upstream and downstream is a must-have for anyone looking to add RisingWave to their tech stack.

Since v1.0, we have enhanced our list of available upstream connectors, especially for platforms that are Apache Kafka compatible. For instance, you can also ingest data from AutoMQ, Confluent Cloud, WarpStream, and more.

Even more impressive is our extensive set of downstream connectors. RisingWave has grown from only six downstream connectors to 25 connectors. Some of the new connectors since v1.0 include ClickHouse, Delta Lake, Elasticsearch, Redis, and Snowflake.

Additionally, continuous improvements are made to existing connectors, ensuring that the user experience and performance get better with each iteration. The built-in source connectors for PostgreSQL and MySQL have gone through major revamps. Ingesting CDC data has become more convenient and customizable. Substantial efforts were also made to ensure that the Kafka source and sink connectors are optimized for usability and performance as it is one of our most popular connectors.

At RisingWave, we want to continue to minimize the hassle of integrating RisingWave into your existing data stack so new integrations will continuously be added.

Advanced SQL features

RisingWave allows you to manage and query data entirely with SQL, so we implemented advanced SQL commands and functions.

New SQL commands and clauses have been implemented and upgraded throughout the year. Support for the INCLUDE clause was introduced, allowing you to ingest the metadata included in each payload. To easily retrieve data from tables and materialized views, you can create subscriptions and cursors. The ALTER commands provide you with thorough control over all database objects, making it easy to manipulate their characteristics.

No matter what type of data you need to manipulate, RisingWave is sure to have a set of functions available. We introduced advanced functions that handle jsonb type and array type data. For complex string search patterns, you can refer to our collection of regular expression functions.

UDFs have expanded to include more languages and embedded UDFs so you can define them within RisingWave. Moreover, with embedded UDFs, you can create aggregate functions as well.

Of course, as a stream processing platform, ample tools are available for handling real-time data streams. Watermarks, time windowing, and temporal filtering are core features that have been implemented and continuously improved upon since early on.

RisingWave has built an extensive set of tools for handling and manipulating data, ensuring anyone can easily adopt RisingWave.

Scalability and database management

To guarantee a smooth stream processing experience, you can easily configure how to deploy RisingWave and how it will utilize computing resources to fit your circumstances.

Adaptive parallelism is the default state of RisingWave, automatically adjusting its parallelism whenever new nodes or CPUs are added to the cluster. You can choose to disable this but it provides a smoother scaling experience.

For managing metadata, you can choose which SQL backend (SQLite, PostgreSQL, or MySQL) to use as the meta store backend. The shift from etcd to a SQL backend provides better stability, performance, and observability.

A set of runtime parameters is available for you to configure as you see fit. These allow you to change the behavior of sinks and sources created, the type of join performed, the streaming and batch parallelism, and more.

RisingWave can be configured to perfectly suit your needs.

Going forward in v2.0

Unified SQL stream and batch processing

From v2.0 onward, RisingWave will expand its capabilities beyond just a stream processing platform to a unified batch and stream processing approach. Handling real-time data will always be the main focus of RisingWave. But now we will also extend support to more traditional data platforms such as data lakes and data warehouses. Users will be able to perform ad-hoc queries on these sources.

Notes

Premium Edition

If you are interested in purchasing a license for RisingWave Premium or would like to learn more, please contact our sales at sales@risingwave-labs.com.

>

These are some of the highlight features included in v2.0. To see the entire list of updates, which includes new SQL commands and changes to existing connectors, please refer to the detailed release notes. > >

>

Look out for next month’s edition to see what new, exciting updates will be made. Check out the RisingWave GitHub repository to stay up to date on the newest features and planned releases. > >

>

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and LinkedIn, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide. > >

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.