The RisingWave team continues to deliver impressive features and updates with this new v1.4 release. This release continues to introduce new upstream and downstream connectors while improving upon existing ones. New functions and operators have also been added to allow for more complex data transformations. Follow along as we highlight some of the more notable features.

If you are interested in the full list of v1.4 updates, see the release note.


A new version of the S3 source connector


This release of RisingWave introduces a new version of the existing AWS S3 source connector. This improved version of the S3 source connector aims to resolve the scalability issues that the previous connector faced. The method of creating an S3 source with the improved connector is almost exactly the same as before. If you are interested in testing the integration between AWS S3 and RisingWave, we recommend using the upgraded version of the S3 source connector for a stable, hassle-free experience.

For more details:


New sink connectors


The RisingWave team continues to improve RisingWave’s ability to connect to more systems, allowing RisingWave to be easily incorporated with any existing data stack. Check out our Integrations page for a list of supported systems. You can also vote to signal support for a specific integration to prioritize its development. With this release, RisingWave can sink data to Doris, Google BigQuery, and Redis. BigQuery, Doris, and Redis can all be used as a data storage tool. Using a simple SQL command, you can easily create a sink in RisingWave to output data to all these systems.

For more details:


Support for COMMENT ON command


We now offer support for the COMMENT ON command. With this command, you can add comments to tables and columns, providing you with the ability to add documentation to the database objects. For current and future collaborators, the extra comments may allow them to quickly understand the purpose and usage of the tables and columns.

The following SQL query adds a comment to the table t1.

COMMENT ON TABLE t1 IS 'this is table 1'; 

To see the comment made, you can select from the rw_description RisingWave catalog.

SELECT * FROM rw_description; 
------- 
objoid | classoid | objsubid | description 
-------+----------+----------+------------------- 
  1001 |       41 |          | this is table 1 

You can also use the SHOW COLUMNS or DESCRIBE commands for the same purpose.

For more details:


Support for subqueries in UPDATE and DELETE commands


You can now include a subquery when using the UPDATE or DELETE command. This allows for more complex data manipulations and easier data clean-up as you are able to perform updates and deletions based on specific conditions.

UPDATE table_name   SET col_1 = col_1 * 2.2 
  WHERE col_1 NOT IN (SELECT col_2 FROM drop_vals); 

For more details:

  • See the documentation for theUPDATE clause.
  • See the documentation for the DELETE clause.


A multitude of new JSON functions and operators


These new JSON functions and operators introduced in RisingWave handle JSONB types as inputs or outputs. In RisingWave, JSONB is a data type that enables you to create a column that stores JSON data. If being able to JSONB data types improves your data cleaning and transformation experience, these new features may come in handy.

This release introduces numerous JSON operators, allowing you to easily manipulate JSON data in RisingWave. Here are some examples of the new JSON operators:

  • The @> operator, which checks if the left JSONB object contains the values that are in the right JSONB object.
'[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb → t
  • The - operator, which deletes a key-value pair, all matching keys, array elements, or an element at the specified index from a JSON object.
'{"a": "b", "c": "d"}'::jsonb - 'a' → {"c": "d"}
  • The #- operator, which deletes the element at the specified path. The path element can be field keys or array indexes.
'["a", {"b":1}]'::jsonb #- '{1,b}' → ["a", {}]

The full list of operators includes: @><@??| , ?&, #>,  #>>, - and #-.

Four new JSON functions have also been added and they include jsonb_extract_path, jsonb_object, jsonb_pretty, and jsonb_strip_nulls. We will briefly introduce the jsonb_object and jsonb_pretty functions.

  • jsonb_object turns the input array of text elements and returns a JSONB object, where adjacent values in the string are read as key-value pairs.
jsonb_object(array['a', null]) → {"a": null}
  • jsonb_pretty outputs the input JSONB object as a nicely formatted text.
SELECT jsonb_pretty('[{"f1":1,"f2":null}, 2]');
------RESULT
[
    {
        "f1": 1,
        "f2": null
    },
    2
]

For more details:

Conclusion

These are just some of the new features included with the release of RisingWave v1.4. To get the entire list of updates, which includes new window functions, additional upgrades to existing connectors, and more, please refer to the detailed release notes.

Look out for next month’s edition to see what new, exciting features will be added. Check out the RisingWave GitHub repository to stay up to date on the newest features and planned releases.

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and Linkedin, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.

One and a half year ago, in April 2022, we open sourced RisingWave, the distributed SQL streaming database. A quarter ago, in July 2023, we released the first official version of RisingWave, RisingWave 1.0, a battle-tested system that can be used in production. More recently, RisingWave 1.3 has been released

As an open-source streaming database released under Apache 2.0 license, the development team behind RisingWave actively collect feedback from users and strives to democratize stream processing: to make it simple, affordable, and accessible.

As a system that has been deployed in production in dozens of enterprises and fast-growing startups, how will RisingWave evolve? We plan to make it transparent and periodically update our roadmap. Here’s what you can anticipate in the future release of RisingWave.

Note that the roadmap is not final, and we will frequently update our roadmap to reflect and the item priority to better serve users.


Short-term goals (within the next 3 months)

  • Adaptive Scaling
    Implement adaptive scaling to automatically adjust materialized view parallelism based on the number of CPU cores in the cluster.
  • Improvements on the Existing External Sinks
    Optimize performance and improve stability of supported external sinks like Doris, Clickhouse, and Elasticsearch. We’ll also expand supported encoding formats for Kafka sink, including Protobuf, Avro, and the support for Schema Registry.
  • Iceberg Sink V2
    We recently introduced a native integration with Iceberg, which is no longer based on the official Java library. It’s fully rewritten by Rust for performance and stability. We plan to stabilize it in the next few months.
  • Enhanced Observability
    Expand system tables and add metrics for stateful operators to provide greater visibility into system health and performance.
  • Improved Open-source Web UI
    Enhance RisingWave's open-source web UI with additional system information and monitoring capabilities.
  • Sink into table
    Users may want to dynamically union the results of multiple views into a single table. For example, a view may correspond to an department in a company while there can be new departments once in a while. With this feature, users can seamlessly merging data from new views as they are added.
  • CDC Connection Sharing
    RisingWave currently creates one CDC connection per table. Each connection will individually consume the replication logs, which consists of transactions not only to the source table, but also other tables in the same database. Therefore, multiple connections will lead to the duplicate consumption and a heavy load on the upstream database. Shared CDC connections can thus reduce the load and improve the stability of CDC.
  • Recoverable CREATE MATERIALIZED VIEW
    Persist materialized view progress to allow recovering from failures without losing work already completed.
  • CDC Transaction Atomicity
    CDC transactions in RisingWave currently applies by events, which may contain only partial content in a transaction. With the new feature, RisingWave will buffer all CDC events within a transaction until it can be fully applied atomically.
  • Parallel CDC Snapshot Loading
    Introduce parallelism during CDC snapshot loading to improve user experience for large upstream tables.


Mid-term goals (within the next 6 months)

  • SSL/TLS Secured Connection
    Implement SSL/TLS encryption for client/server communications to enhance security.
  • Alter Materialized View
    Add ability to modify existing materialized views.
  • Session Window
    Introduce session window functionality for advanced streaming analytics.
  • MemTable Spill
    A refresh to a small table could suddenly cause 1k times amplification on write throughput. Such a case typically happens when there is a 10+ way join. A way to mitigate this is to use the local disk as a buffer for the flooded writes, thus avoiding OOM.
  • Dedicated Computes for Materialized View Creation
    Some users complained that RisingWave’s materialized view creation is too slow, as it requires a resource-intensive ad-hoc computation. On the other hand, since the streaming (incremental computations) is long-running, it requires less resources at the same time. As a result, it’s possible to allocate dedicated resources for MV creation separately when needed, and deallocate them once finished.
  • More External Sinks
    Redshift Sink and Snowflake Sink are in the plan.
  • Recursive CTE
    Enable recursive common table expressions (CTE) to traverse hierarchical data like the organizational tree in a company.
  • Shared Meta Plane
    Enable RisingWave clusters to share the meta plane, including Etcd (or Postgres in the future), to better utilize compute resources across clusters.


Long-term goals

  • Optimize analytical query performance on third-party systems like Presto and Trino
  • GraphQL API
    To allow retrieving results from RisingWave directly through the browser.
  • Serverless Compaction
    Automatically scale Compactor instances in and out to match workload demands in a serverless model.

CONCLUSION

RisingWave is an open-source streaming database aiming at democratizing stream processing: to make stream processing ease-of-use and cost-efficient. Its development direction is highly influenced by user requests. We would love to hear from the community and update our agenda accordingly. If you have any questions or comments regarding RisingWave’s roadmap, please don’t hesitate to let us know by commenting here. Your voice will help shape the future of real-time stream processing!

We are thrilled to announce a strong partnership between RisingWave Cloud and Confluent Cloud, a leading figure for processing data in motion. RisingWave officially joins the Connect with Confluent program as a new verified partner. This marks a significant milestone in our journey to democratize stream processing.

Close
Featured RisingWave Labs is now an official Confluent Partner

As a company committed to staying at the forefront of innovation and ensuring that our clients have access to the best stream processing platform in the world, our decision to join the Confluent Partner Program was a natural choice.


What is RisingWave Cloud


RisingWave Cloud is a fully managed hosted service built to democratize stream processing. RisingWave Cloud offers a managed service for customers to create/drop/scale their RisingWave clusters, and build their stream processing applications easily with a few lines of SQL queries. Powered by RisingWave, RisingWave Cloud provides you with more flexibility, reliability, and scalability for your cloud-native streaming applications.


What is Confluent Cloud


Confluent Cloud is a fully managed streaming data service that harnesses the potential of Apache Kafka. Their services enable organizations to harness the power of real-time data, making it easier to build data-driven applications, gain valuable insights, and deliver exceptional customer experiences. To get start with Confluent Cloud, start your free trial here.


Key Benefits of Using Confluent Cloud with RisingWave Cloud


By partnering with Confluent Cloud, we can now leverage their cutting-edge technology and expertise to provide our customers with a complete set of solutions for capturing data in motion. Apache Kafka lies at the center of gathering and re-routing streaming data, however, it is still challenging to transform the streaming data on the fly to deliver real-time analytical insights, or to combine multiple streams with historical data for streaming ETL.

RisingWave provides a perfect match with Apache Kafka to deliver a collective solution for data in motion, from gathering data to real-time analytics. Whether you are looking to optimize your data pipelines, build event-driven applications, or enhance your data integration capabilities, our partnership with Confluent opens up a world of possibilities.

The combination of Confluent Cloud and RisingWave Cloud brings the following key advantages:

  • Managed Services: Both Confluent Cloud and RisingWave Cloud offer managed services with robust high availability and failure recovery solutions. On our side, RisingWave Cloud provides a very user-friendly experience so that customers can provision and manage their RisingWave clusters easily and quickly. This offloads the operational burden from your IT team, allowing them to focus on more strategic tasks.
  • Easy integration: With built-in Confluent Cloud source connectors, users can easily connect to an existing topic on a Confluent Cloud cluster from RisingWave Cloud. RisingWave Cloud will automatically deduce the schema from the schema registry service, and consume the data to deliver real-time insights.
  • Scalability and Cost-Efficiency: RisingWave Cloud can easily scale up/down/in/out your RisingWave clusters to accommodate changing workloads and storage needs. With our innovative compute-storage decoupled architecture, RisingWave Cloud achieves extreme cost-efficiency compared with other vendors.
  • Security: RisingWave Cloud guarantees a secure connection with Confluent Cloud only after being authenticated by API keys, which are managed by Confluent Cloud. RisingWave Cloud invests heavily in security measures under SOC2 compliance. No more worry about data leakage.
  • Solid and speedy technical support: Backed by RisingWave creators, our experts provide all necessary enterprise-grade technical support in any time zone.

Get started

To learn how to use RisingWave Cloud to ingest data from Confluent Cloud? Be sure to check out our recent video tutorial. You can also check out our documentation for detailed explanations.

Want to learn more about RisingWave? Don’t forget to join our community, follow our Twitter and Linkedin, and subscribe to our newsletter. Stay tuned!

This new release of RisingWave (v1.3) is packed with new and valuable features for sinks, sources, and SQL functions. Notable enhancements include advanced support for regular expressions, capabilities for delivering data to more downstream systems, and the addition of the WITH ORDINALITY clause.

For the full list of updates, see the release note.


Support for new array functions


With this release, multiple new array functions were added. These new functions include array_min, array_max, array_sort, array_sum. Array functions accept array types as inputs. These differ from aggregate functions, which perform calculations on a set of values. For more details on the new functions, including examples, look for the functions listed above on the Array functions page.


Advanced support for regular expression functions


We have introduced enhanced support for advanced features in our existing regular expression functions. These functions, including regexp_count, regexp_match, regexp_matches, and regexp_replace, now offer additional capabilities such as backreference, positive lookahead, negative lookahead, positive lookbehind, and negative lookbehind.


Support for WITH ORDINALITY clause


The WITH ORDINALITY clause is now also supported. When using set returning functions, this clause can be used at the end of the FROM clause in a SELECT statement. It will add a new column to the query results, which numbers the rows returned by the function, starting from 1. For more details, see WITH ORDINALITY.

SELECT * FROM
unnest(array[0,1,2])
WITH ORDINALITY;


Syntax update for creating Kafka, Kinesis, and Pulsar sinks


Previously, when creating a sink, the type of sink is specified under the WITH options with the type parameter. As an example, the following SQL query creates an append-only Kafka sink.

CREATE SINK s1 FROM mv WITH (
  connector = 'kafka',
  properties.bootstrap.server = 'localhost:9092',
  topic = 'test_topic',
  type = 'append-only');

Now the type of sink and data format can be specified with FORMAT...ENCODE... to be consistent with CREATE SOURCE. Note that the append-only type has been updated to PLAIN. For now, only the JSON format is supported. The following SQL query creates an append-only Kafka sink. Note that PLAIN is specified instead of append-only.

CREATE SINK snk FROM mv with (
  connector = 'kafka',
  properties.bootstrap.server = 'localhost:9092',
  topic = 'test_topic')
FORMAT PLAIN ENCODE JSON;

This syntax change only applied to Kafka, Kinesis, and Pulsar sinks. For more details, see the Sink data to Kafka page or the Sink data to AWS Kinesis page.


A group of new sink connectors


This release introduces support for multiple new sink connectors, allowing you to sink data transformed in RisingWave into more systems. RisingWave strives to provide users the ability to deliver data to popular downstream systems. Check out our Integrations page for a list of supported systems. You can also vote to signal support for a specific integration to prioritize its development.


Cassandra and ScyllaDB sink


Cassandra and ScyllaDB are open-source NoSQL database management systems capable of handling large workloads. Since ScyllaDB is built as a replacement for Cassandra, the RisingWave sink connector for both are the same. See the Cassandra sink guide for more details.


Doris sink


Apache Doris is an open-source distributed analytical database designed for ad-hoc analysis of large volumes of data. See how simple the process is to deliver data from RisingWave to Doris with this guide.


Elasticsearch sink


Elasticsearch is an open-source distributed search and analytics engine designed to process and handle large values of data in real time. Follow our guide to see how to create an Elasticsearch sink in RisingWave.


NATS messaging system


NATS is an open-source messaging system, which follows a pub-sub and request-response messaging pattern, ideal for real-time applications. See the documentation for details on how to sink data to NATS.


Pulsar sink


Apache Pulsar is an open-source distributed messaging system, designed for developing real-time, event-driven applications. With RisingWave, you can now ingest data from and sink data to Pulsar. See our new guide on how to create a sink to deliver data to Pulsar.

Conclusion

These are just some of the new features included with the release of RisingWave v1.3. To get the entire list of updates, which includes new regular expression functions, please refer to the detailed release notes.

Look out for next month’s edition to see what new, exciting features will be added. Check out the RisingWave GitHub repository to stay up to date on the newest features and planned releases.

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and Linkedin, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.

We are thrilled to announce the release of RisingWave v1.2. Here are the key highlights in this release.


Emit-on-window-close syntax change


The syntax for the emit-on-window-close policy has changed. If your application uses the old syntax, please see the updated documentation and update your code accordingly to avoid any issues.

For more details about Emit-on-window-close, see our documentation.


Streaming job management


With the introduction of the CANCEL JOBS and SHOW JOBS SQL queries, you can now cancel or see the progress of streaming jobs. In RisingWave, a streaming job creates a materialized view. This allows you to save time by canceling streaming jobs that are no longer necessary.

For more details about streaming job management, see our documentation.


Lateral subqueries


This release supports lateral subqueries, which provide you with more methods to perform complex calculations and efficient data transformations. The keyword LATERAL can be included before a subquery’s SELECT statement, enabling you to reference columns from preceding subqueries.

For more details about lateral subqueries, see our documentation.


Sink data to ClickHouse


We now support the integration of the popular database management system ClickHouse. With a straightforward CREATE SINK query, you can easily sink data from RisingWave to ClickHouse.

For more details about sink data to ClickHouse, see our documentation.


New version of Iceberg sink connector


BREAKING CHANGE: The Iceberg sink connector has been updated. If your application included sinking to Iceberg, please review the syntax and parameters on the Sink data to Iceberg guide to ensure everything continues to run smoothly.

For more details about new version of Iceberg sink connector, see our documentation.


Transactions within a CDC table

When you use a native RisingWave sink connector, such as MySQL, PostgreSQL, or Citus CDC, you can enable transactions within a CDC table.

For more details about how to enable this feature, see our documentation.

LIKE expressions


In this release, we support [NOT] LIKE and [NOT] ILIKE pattern-matching expressions, as well as the respective [!]~~ and [!]~~* operators. LIKE expressions can be used in SHOW SQL queries, giving you the option to filter the output for relevant results.

For more details about LIKE expressions, see our documentation.

Conclusion

These are just some of the new features included with the release of RisingWave v1.2. To get the entire list of updates, which includes new regular expression functions, please refer to the detailed release notes.

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and Linkedin, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.

RisingWave Cloud: next-gen stream processing


Powered by RisingWave, RisingWave Cloud offers a managed service that provides you more flexibility, reliability and scalability for your cloud-native streaming applications.

In particular, RisingWave Cloud holds the following key advantages:

Close
Featured Get started on stream processing application in 5 minutes
  • Managed Services: RisingWave Cloud offer managed RisingWave services that offer robust high availability and failure recovery solutions. RisingWave cloud will automatically handle routine maintenance tasks such as backups, patch management, and security updates. This offloads the operational burden from your IT team, allowing them to focus on more strategic tasks.
  • Scalability and Cost-Efficiency: RisingWave Cloud can easily scale up/down/in/out your RisingWave clusters to accommodate changing workloads and storage needs. With our innovative compute-storage decoupled architecture, RisingWave Cloud achieves extreme cost-efficiency comparing with other vendors.
  • Security: RisingWave Cloud invest heavily in security measures under SOC2 compliance. No more worry about data leakage.
  • Rapid Deployment: RisingWave Cloud provides very user-friendly experience so that customers can provision and manage their RisingWave clusters easily and quickly. The package of builtin monitoring tools, web query editors and source/sink integrations putting together delivers an one-stop development solution.
  • Solid and speedy technical support: Backed by RisingWave creators, our experts provide all necessary enterprise-grade technical supports in any time zone.


RisingWave Cloud on AWS Marketplace


We are thrilled to announce that RisingWave Cloud has landed on AWS Marketplace! The AWS Marketplace integration offers a streamlined and simplified process of discovering, procuring, and deploying RisingWave on the Amazon Web Services (AWS) cloud platform.

Close
Featured RisingWave Cloud has landed on AWS Marketplace

With RisingWave Cloud on AWS Marketplace, you get seamless integration with the industry-leading reliable infrastructure of AWS and its ecosystem. It simplifies the entire software lifecycle, from selection to deployment to ongoing management, making it a valuable resource for organizations looking to leverage the full potential of AWS services and third-party software offerings. Get started now

We are thrilled to announce the new release of RisingWave: the Rust-written, open-source streaming database. RisingWave v1.1 is packed with many exciting new features. Here are the key highlights.


Watermarks


Real-time data ingestion is inherently unordered. To properly handle out-of-order events, we need watermarks.

Accordingly we introduce watermarks in this release. In RisingWave, you can define watermarks in sources and append-only tables. Once you define them, you can use them in time-window aggregations and other stateful operations.

Append-only tables is also a new feature that was introduced in this version to facilitate the definition of watermarks.

For details about watermarks, see Watermarks.


Emit-on-window-close policy


For windowed calculations, we offer the flexibility to emit results either upon update or when the window closes.

Emitting results upon window close is particularly advantageous in certain situations.For example, when writing to append-only downstream systems like Kafka or S3, or when dealing with calculations that benefit from triggering only when the window closes, such as percentile calculations.

Our emit-on-window-close policy, introduced in v1.1, caters to these scenarios.

For details about this policy, see Emit on window close.


Read-only transactions


We recognize the significance of transactions in ensuring data processing consistency and reliability.

In this release, so we have incorporated support for read-only transactions. All reads within a transaction are executed against the consistent snapshot, guaranteeing data consistency as they originate from the same point-in-time.

For details about Read-only transactions, see Transactions.


New window functions


To enhance your data analysis capabilities, we have introduced several new window functions in RisingWave v1.1. These include lead(), lag(), first_value(), and last_value().

For details about new window functions, see Window functions.

Conclusion

These features are just a glimpse of the many exciting additions in RisingWave v1.1. For a comprehensive overview, please refer to the detailed release notes.

As we approach the 2,000th community member, we anticipate even more remarkable progress and success in our mission to revolutionize stream processing for the benefit of all.

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and Linkedin, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.

RisingWave, the distributed SQL streaming database, recently celebrated a significant milestone with the achievement of 1000 members in our Slack community. This event marks a momentous occasion for RisingWave, a project we have diligently worked on for the past three years.

From its initial release on GitHub under the Apache 2.0 license, RisingWave has garnered considerable attention, attracting numerous companies that have successfully deployed it in production to support real-time applications across diverse industries, such as manufacturing, e-commerce, fintech, SaaS, and more.

Many have inquired about our rationale behind open sourcing RisingWave under the Apache 2.0 license and questioned whether we plan to change the license in the future. We want to address these queries with utmost transparency.

Here are the top three reasons why we adhere to Apache 2.0 License:

  1. Promoting Stream Processing Technology: Despite being studied for over two decades, stream processing technology is still in its early stages in the market. Many people still need to learn about stream processing and its benefits. By open sourcing RisingWave, we aim to advocate and promote this technology, making it accessible to a broader audience rather than catering solely to a select group of professionals.
  2. Transparency and Vendor Neutrality: Transparency is of utmost importance to us. Customers deserve a complete understanding of how the product they use operates. Open sourcing RisingWave eliminates the black box approach and prevents vendor lock-in, granting users greater control and flexibility.
  3. Community Feedback and Improvement: Collecting user feedback is crucial in building an exceptional product. Open source facilitates wider adoption, encouraging the community to provide timely feedback, which helps us enhance and refine RisingWave.


From a commercialization standpoint, we have compelling reasons to open source RisingWave under the Apache 2.0 license. Contrary to concerns, this move does not harm our business but drives us to deliver an even better product to our customers. It is strategic:

  1. Alignment with Cloud Services Trend: In the era of cloud services, SaaS vendors provide not just technology but also valuable services and support. Companies prefer to focus on developing their core products rather than investing resources in maintaining backend systems. By open sourcing RisingWave, we align with this trend and prioritize our customers' needs.
  2. Anticipating Industry Adoption: We are also confident that stream processing technology will witness widespread adoption across various companies, much like the popular MySQL and PostgreSQL. Even if we decided not to open source RisingWave, others might create similar solutions and make them open source, reflecting the inevitability of this technology's proliferation.
  3. Democratizing Stream Processing: Since day one, our objective has been to democratize stream processing, making it simple, affordable, and accessible. Through open sourcing RisingWave and serving the community, we are highly motivated to continue improving the product, extending the benefits of stream processing to an even larger audience.

Conclusion

As we approach the 2,000th community member, we anticipate even more remarkable progress and success in our mission to revolutionize stream processing for the benefit of all.

Sign up for our monthly newsletter if you’d like to keep up to date on all the happenings with RisingWave. Follow us on Twitter and Linkedin, and join our Slack community to talk to our engineers and hundreds of streaming enthusiasts worldwide.

On July 12, 2023, RisingWave, the open-source distributed SQL streaming database, announced the official release of the RisingWave 1.0 version.

RisingWave is a distributed SQL database specifically designed for stream processing. It is wire-compatible with PostgreSQL and independent of complicated JVM-based components. This design choice enables users to learn, develop, and manage stream processing systems with minimal barriers.

Developed in Rust, RisingWave achieves remarkable cost efficiency through its storage-compute separation architecture, layered storage, and highly specialized optimizations for its SQL interface. Using the Nexmark benchmark, a popular stream processing benchmark, RisingWave demonstrates a performance improvement of 10–30% in stateless computations compared to Apache Flink. Furthermore, it demonstrates an astounding improvement of up to 660 times in stateful computations compared to Apache Flink.

After being open-sourced on GitHub on April 8, 2022, under Apache 2.0 license, RisingWave has garnered widespread attention from developers worldwide. In just over 15 months, RisingWave has accumulated nearly 5K stars, 400 forks, and over 130 open-source contributors on GitHub. Moreover, RisingWave has built a vibrant community with almost 1,000 members in its Slack workspace. The RisingWave team has successfully deployed the database in production environments for dozens of companies, and the number of globally tracked long-running Kubernetes clusters is nearing one hundred.

Before releasing the first official version, RisingWave underwent extensive testing and demonstrated stable operation in multiple production environments for over 6 months. Building upon this stability, all future versions of RisingWave will ensure backward compatibility and provide users with convenient version upgrade services. Below, you will find a list of the core features of RisingWave 1.0.


Core Features of RisingWave

PostgreSQL wire-compatible


Use RisingWave just like you would use PostgreSQL. RisingWave not only supports the PostgreSQL SQL dialect but also provides seamless integration with a wide range of interfaces, tools, and systems that are compatible with PostgreSQL.

Comprehensive feature set

RisingWave offers an extensive array of advanced stream processing features, including window functions, watermarks, User-Defined Functions (UDFs), and more. It empowers users with powerful capabilities for processing streaming data.

Exceptional stream processing performance

In complex stream processing scenarios like multiway stream joins, RisingWave delivers outstanding performance. It outperforms previous-generation stream processing systems like Apache Flink, achieving 2 to hundreds of times higher performance.

Efficient dynamic scaling

With its decoupled compute-storage architecture, RisingWave enables independent and infinite scaling of its compute and storage layers. As the streaming workload fluctuates, users can easily add or remove resources, optimizing performance while minimizing costs.

Fast fault recovery and high availability


RisingWave features automatic and instant fault recovery capabilities. Additionally, users can configure multiple replicas to ensure uninterrupted operation on confronting data center power outages, crashes, and other issues. This ensures the high availability of the service.

Open ecosystem

RisingWave seamlessly integrates with various mainstream systems, including messaging systems, OLTP databases, analytical databases, data lakes, data warehouses, microservices, and more. It provides flexibility and interoperability within the larger data ecosystem.

Observability


RisingWave provides a comprehensive set of observability tools that enable users to monitor and understand the operational status of the system easily. It offers insights into computation progress, internal state size, and other relevant metrics.

Interpretability

RisingWave includes built-in storage capabilities that can maintain stream processing results. Users can query results in real-time with consistency guarantees, facilitating the verification of program correctness and enabling efficient debugging.

Conclusion


RisingWave is a cloud-native stream processing database that has emerged in the era of cloud computing. Apache Flink is a widely adopted distributed stream processing framework in the big data field. Despite the evolving landscape, performance remains a crucial aspect. 

Developed in Rust and specifically tailored for cloud environments, RisingWave claims to provide a 10X improvement in performance compared to Flink. To achieve optimal performance, RisingWave underwent extensive performance testing for over six months. Recently, the RisingWave team shared a private preview of the performance report with more than 50 community members, including seasoned engineers, Flink contributors, and Flink PMC members.

This week has been phenomenal. We are pleased to release a preview version of the performance report publicly. This will allow everyone to gain insights into the performance comparison between RisingWave and Flink.


Before we begin


Although Apache Flink and RisingWave are both open-source stream processing systems, their design philosophies differ significantly, making a direct comparison between the two somewhat unfair. However, we can still outline the similarities and differences between the two projects to provide a reference for comparison. In this discussion, we will primarily focus on comparing Flink SQL and RisingWave.

Flink is an open-source project that originated and developed during the big data era. It is a unified stream and batch processing platform designed for general-purpose use. While Flink is well-known for its efficient stream processing capabilities, it has expanded to encompass other domains as well. Flink is evolving into a unified system that covers features such as batch processing, machine learning (Flink ML), data lakes (Paimon, also known as Flink Table Store), and stateful functions (StateFun). Flink provides a Java API similar to MapReduce, and it offers higher-level Python and SQL interfaces built on top of the Java API for ease of use.

RisingWave is a cloud-native streaming database designed to improve the performance and efficiency of cloud-based stream processing. RisingWave is wire-compatible with PostgreSQL, meaning users can connect to it using any PostgreSQL-compatible tools such as psql and JDBC. 

Unlike Flink, RisingWave does not provide a Java API. Instead, it offers Python User-Defined Functions (UDFs) to enhance the expressiveness of SQL queries. This approach provides users with a more flexible and intuitive way to work with the data and enables them to use the programming language they are most familiar with.

To better understand the similarities and differences between Flink and RisingWave, our focus will be on comparing their respective SQL capabilities.


Benchmarks and Environment


To compare Flink and RisingWave, we used Flink version 1.16 and RisingWave version 0.19.

In the realm of stream processing, the Nexmark benchmark is widely recognized. The Flink community uses the Nexmark benchmark with source code available. The RisingWave team implemented Nexmark using Kafka as the streaming data source. Check out the implementation here

However, as the Nexmark benchmark does not cover certain commonly used SQL operators, we included an additional set of 5 query statements.

To simulate real-world scenarios, we generated continuous data using Kafka and connected it to both Flink and RisingWave. Our primary focus was on throughput rather than latency. It is worth noting that both Flink and RisingWave were configured to guarantee exactly-once semantics.


Hardware Configuration

For our tests, we utilized AWS EC2 c5.2xLarge instances, each of which features eight vCPUs and 16GB of memory.

RisingWave consists of four components:

  1. Frontend Node
  2. Meta Node
  3. Compute Node
  4. Compactor Node

The Compute Node and Compactor Node share the same machine, while the Frontend Node and Meta Node have minimal impact on performance and share another machine.

In contrast, Flink consists of two components:

  1. Job Manager
  2. Task Manager

The Task Manager occupies one machine (compaction threads in RocksDB correspond to the Compactor Node), while the Job Manager occupies another machine based on their corresponding relationship.


Test Dataset Configuration

To prepare for the test, we allocated a separate machine for Kafka and generated a 2 billion data points dataset using the data generation tool. Once the data generation was completed, we initiated the system (either RisingWave or Flink) to execute the queries. 

After each query finished running, we stopped the system, deleted all checkpoint data on S3, restarted the system, and proceeded to the next query. Each query was constrained to a maximum execution time of 60 minutes, and if it exceeded this time limit, it was promptly terminated. 

The data results presented below are based on the individual runtime of each query.


Performance Test Results

The performance result is shown in the table below.

Close
Featured Performance comparison between RisingWave and Flink. 100% represents the same performance for RisingWave and Flink. Greater than 100% indicates that RisingWave is faster, while less than 100% indicates that Flink is faster.

Performance comparison between RisingWave and Flink. 100% represents the same performance for RisingWave and Flink. Greater than 100% indicates that RisingWave is faster, while less than 100% indicates that Flink is faster.

Please note that in this table, we compared two versions of Flink: Flink and Flink (better storage). We used EBS storage for RocksDB state, and since internal state management can be a performance bottleneck in stream processing, we increased the performance parameters to 12000 IOPS and 500 MB/s on top of using the default EBS gp3 to enhance Flink's performance.

As shown in the table, RisingWave achieved performance advantages in 22 out of the 27 queries listed. Among them, 12 queries demonstrated performance improvements of at least 50% compared to Flink (i.e., values above 150% in the chart). ten queries surpassed Flink's performance by a factor of two (i.e., values above 200% in the chart). 

Notably, q102 achieved a performance improvement of over 520 times, and q104 achieved a performance improvement of 660 times compared to Flink.


Detailed Explanation

Why are some query results missing?

Please note that we did not include the results for queries q6, q11, q13, and q14 in the comparison table. Here is the reason for each:

  • q6: This query uses a window function, which is not currently supported in the current version of RisingWave. However, it will be supported soon (by the end of this month).
  • q11: This query uses session windows, which we consider to be a lower priority, and therefore, it is not yet implemented.
  • q13: This query requires generating a side input, which we omitted for the purpose of this comparison.
  • q14: This query requires UDF (User-Defined Function) support. Although RisingWave supports UDF, it requires deploying a separate UDF service. Since our focus was primarily on testing the performance of RisingWave and Flink, we omitted this query.


Why does the performance of stateless computations appear to be similar?


In queries q0-q3 and q21-q22, RisingWave's performance improvement over Flink is not significant. This may seem counterintuitive since RisingWave, being a project developed in Rust, is theoretically expected to achieve several times the performance of Flink, developed in Java. 

However, the main reason why we did not observe such performance improvement in these tests is that the network I/O introduced by the information transfer between Flink/RisingWave and Kafka became the primary bottleneck. In other words, a significant amount of time was spent on network communication rather than CPU computation. 

Although we can indeed construct complex stateless computations (such as parsing nested JSON) to showcase the performance advantages of Rust projects, Nexmark does not cover such operations, and to ensure fairness, we excluded such tests.


How to explain the two queries (q5 and q16) where RisingWave is significantly slower than Flink?


For Flink's testing, we used the same source definition as the official Nexmark source code. The only difference between Flink and RisingWave is that Flink defines a watermark on the Nexmark data source, which provides additional optimization opportunities for Flink. 

For instance, a window aggregation function can determine when a time window can be closed and output the final result once and only once when the watermark arrives instead of continuously outputting the latest intermediate result after each row update. We refer to the former semantics as Emit On Window Close (EOWC). However, we noticed that Flink doesn't support window aggregation functions with non-EOWC semantics, so we couldn't force Flink to use non-EOWC semantics by removing the watermark definition. 

In the production environments of RisingWave users, they have expressed a need for both semantics.

RisingWave will soon support EOWC, but it was not yet supported in the tests. In Flink's q5, this semantics is utilized, while RisingWave still outputs a large number of real-time intermediate results. This difference causes the downstream join operator in RisingWave to be triggered extensively, resulting in lower performance compared to Flink. We will present the results with EOWC support in the next round of testing.

Regarding q16, we are still investigating. At present, RisingWave does not provide the optimization of Split Distinct Aggregation, as described here. Once we support this optimization, we will provide performance data with the same execution plan in the next version of the performance report.


In which types of queries is RisingWave significantly better than Flink?


RisingWave achieved significant performance improvements, sometimes even doubling or increasing by hundreds of times, in queries that have complex internal states and require a large amount of space. Generally, the more complex a query is, the more complex the internal state it needs to maintain and the larger the state space it occupies. 

For instance, in our tests, queries like q4, q7, q9, q18, and q20 had internal states approaching or exceeding 20GB, while q102 and q105 had internal states close to 10 GB. In such cases, accessing and manipulating the internal state often becomes the bottleneck of the computation process, requiring optimizations such as caching and indexing by the stream processing engine. 

From the test results, it is evident that RisingWave is better equipped to handle complex stream processing tasks compared to Flink.


Why can RisingWave achieve such high performance?


RisingWave's performance is closely linked to its design and implementation. In summary, there are three key factors that contribute to RisingWave's performance advantage over Flink:

  1. RisingWave is developed from scratch in Rust and relies on very few third-party JVM components. This gives it a significant advantage over Flink, which is developed in Java, in terms of language choice. However, this advantage primarily exists at the computation layer, as Flink's underlying RocksDB storage is developed in C++.
  2. As a big data project, Flink has a Java API layer similar to MapReduce, and Flink SQL is essentially a wrapper over the Flink Java API. Computer science theory suggests that the more abstraction layers we have, the worse the performance. In contrast, RisingWave doesn't have an intermediate layer, allowing for highly customized optimizations directly targeting SQL queries, thereby achieving significant performance improvements.
  3. Flink directly uses RocksDB to store intermediate computation states, and RocksDB is unaware of the computation itself. In contrast, RisingWave has its own storage implementation that is aware of the computation, and it uses remote storage (such as S3, HDFS, etc.) to greatly reduce storage costs, thus achieving improved computational costeffectiveness.

In addition to these three factors, there are other aspects that could contribute to RisingWave's performance advantage over Flink. For instance, Flink's current direction is to become a unified platform, and the introduction of features like batch processing, machine learning, and data lakes can easily increase system complexity, resulting in unnecessary performance overhead.


Unmeasured Aspects


Complex Stateful Computations


As mentioned earlier, it is theoretically expected that a system developed in Rust would outperform a system built on JVM languages, especially when handling complex stateful computations. Given that many modern applications require complex operations such as JSON parsing and string processing within stream processing systems, we plan to add tests for such queries in the future.

Multi-Stream Joins


Multi-stream joins are a common scenario in stream processing. When users have multiple data sources, such as multiple MySQL instances or Kafka topics, they often turn to stream processing systems to perform joins for analysis. Handling multi-stream joins often involves dealing with large internal states. 

Based on the results of the Nexmark benchmark test, we have already preliminarily verified RisingWave's excellent performance in scenarios with large internal states, and it is evident that multi-stream joins are an area where RisingWave excels. 

We have conducted experiments on multi-stream joins, and based on initial results, RisingWave can easily handle joins of ten or more data streams, while Flink often encounters state management issues and crashes. We will gradually release the experimental results of multi-stream joins to the public.

UDFs, Watermarks, and Advanced Features


The effectiveness of a stream processing system is not limited to handling basic operators such as aggregation, join, and windows. In practice, users often require advanced features, such as User-Defined Functions (UDFs) or watermarks, to extend the expressive power and ensure the system's correctness. However, due to the limitations of the Nexmark benchmark, we did not test these features. We plan to cover these features in future tests and evaluate their impact on RisingWave's performance.


How to optimize performance in Flink?


Optimizing performance in Flink is not only a technical task but also an experiential one. The RisingWave team has accumulated nearly a year of experience in optimizing Flink's performance. In summary, there are three aspects to consider when optimizing performance in Flink:

  • Deployment optimization: Flink supports Kubernetes deployment, but deploying Flink via Kubernetes does not necessarily result in optimal performance due to its heavy dependencies on the JVM ecosystem. Specific deployment considerations, such as how to deploy ZooKeeper nodes, need to be taken into account to achieve optimal performance.
  • SQL optimization: Flink uses Calcite for SQL parsing and planning, but it lacks awareness of the data, which limits its ability to optimize SQL effectively. Therefore, in some queries, it may be necessary to rewrite the SQL to achieve higher performance manually.
  • RocksDB optimization: Flink uses RocksDB for internal state management. However, since RocksDB is not aware of the computation and acts only as a storage layer, users often need to tune RocksDB parameters for optimal performance manually. It's worth noting that RocksDB has hundreds of tunable parameters, so achieving optimal RocksDB performance requires a deep understanding of its internal structure.


How to degrade performance in RisingWave?


The most likely way to degrade performance in RisingWave is by reducing the memory of the compute nodes and introducing irregular access patterns that result in a high cache miss rate. The underlying principle is that RisingWave uses remote storage to maintain the internal state and caches the most frequently accessed state on compute nodes. When the compute node capacity is limited, and access patterns are irregular, it can disrupt the caching strategy (RisingWave uses an LRU-based algorithm) and cause frequent access to remote storage. Accessing remote storage is often costly, and frequent access inevitably leads to a decrease in overall performance.

Conclusion

This article presents a performance comparison between Flink and RisingWave based on the Nexmark benchmark test. While a single benchmark cannot cover all aspects of a stream processing system, it provides a general understanding of the performance differences and underlying reasons between Flink and RisingWave in common scenarios. Apart from stream processing capabilities, Flink offers other features worth exploring, including batch processing, machine learning, and StateFun. In contrast, RisingWave will focus more on optimizing the performance and efficiency of stream processing.

Performance is not the only criterion for evaluating the superiority or inferiority of a system, but we strive to achieve optimal performance while balancing other important factors such as scalability, fault tolerance, and ease of use.

sign up successfully_

Welcome to RisingWave community

Get ready to embark on an exciting journey of growth and inspiration. Stay tuned for updates, exclusive content, and opportunities to connect with like-minded individuals.

message sent successfully_

Thank you for reaching out to us

We appreciate your interest in RisingWave and will respond to your inquiry as soon as possible. In the meantime, feel free to explore our website for more information about our services and offerings.

subscribe successfully_

Welcome to RisingWave community

Get ready to embark on an exciting journey of growth and inspiration. Stay tuned for updates, exclusive content, and opportunities to connect with like-minded individuals.