RisingWave Community Retrospective and a Look Forward
Since its official open-source release on GitHub under the Apache 2.0 license on April 8, 2022, RisingWave has grown into a popular project with over 100 contributors and 3,700 stars. Let's take a step back and reflect on our open-source community.
RisingWave is a cloud-native SQL streaming database that started in early 2021. Since its official open-source release on GitHub under the Apache 2.0 license on April 8, 2022, RisingWave has grown into a popular project with over 100 contributors and 3,700 stars.
We are greatly indebted to our community members. A product can become great with the collective wisdom of a thriving and open community more quickly than with the experience of a small group of experts. The community can shape the future of RisingWave by designing the project roadmap, deploying the distributed streaming database in their cloud service, and much more. We look forward to increased community engagement in 2023.
2022 has been a great year for our company, and we shared the updates in our previous blog. Now, let's take a step back and reflect on our open-source community. What did we achieve? And what more is in the store?
RisingWave Labs Objectives
We launched RisingWave — an SQL streaming database — with the sole mission of democratizing stream processing by making it simple, affordable, and accessible. Since the publication of the research paper "Monitoring Streams – A New Class of Data Management Applications" at the VLDB 2002, stream processing has been explored by both academia and industry for two decades. Industrial-grade projects such as Apache Storm, Apache Flink, Apache Samza, and Apache Spark Streaming have quickly emerged, further validating the significant value that stream processing can bring to modern companies. However, existing stream processing systems still have several main drawbacks, including high learning curves, development and operation cost, and performance cost. RisingWave was designed to address these issues with the following key features:
- full wire-compatibility with PostgreSQL enabling users to perform complex analysis over data streams using simple PostgreSQL-style SQL
- independence from outdated JVM ecosystem components (such as Zookeeper), and native Kubernetes support to simplify system maintenance
- compute-storage separation architecture to greatly reduce performance costs and achieve infinite elastic scaling in the cloud
As a database system, RisingWave supports data persistence and allows users to perform high-concurrency queries on stored data, enabling users to use one single RisingWave instance to replace the traditional "stream processing framework + database" combination (such as Flink + Cassandra) for efficient support of real-time applications.
RisingWave's goal is not to replace existing computation frameworks like Apache Flink but to make stream processing more accessible and efficient for a broader audience. It is similar to how Snowflake (a SQL database) and Apache Spark (a big data computation framework) have mutually benefited each other in the batch processing domain.
RisingWave Labs Achievements in 2022
1. Enhancing stability and performance testing
Stability and performance are the foundation of any hard-core system. No matter how powerful the system is, instability and low performance make it difficult for the system to gain wide adoption from user groups. RisingWave Labs has considered stability and performance testing top priorities since day one. In the past year, RisingWave Labs has continuously collected test cases from various products and added them to its integration tests. At the same time, RisingWave Labs also added a deterministic testing framework to make it more convenient to locate and eliminate bugs. RisingWave Labs completed the first version of internal performance testing. Using performance test benchmarks such as Nexmark and TPC-H, we concluded that RisingWave can achieve 50% or more performance improvement over similar products in both stateless and stateful computing.
2. Implementing commonly used functionalities of PostgreSQL
RisingWave speaks PostgreSQL to significantly lower the bar for users to adopt stream processing technology. This means that as long as users can use PostgreSQL, they can use RisingWave for stream processing. PostgreSQL is a mature database system that has been battle-tested for more than 30 years. Its supported functions are extremely complex. It would take several years to fully implement the functions of PostgreSQL. However, RisingWave has already implemented the most commonly used PostgreSQL operators and expressions. It supports complex queries well. RisingWave Labs is constantly adjusting the priority of function implementation through community dynamics, hoping to build a more complete system based on real-world use cases.
3. Integrating mainstream systems
RisingWave is fully wire-compatible with PostgreSQL. In other words, RisingWave can be integrated with any system that supports PostgreSQL. But due to the complexity of system implementation, RisingWave still has to conduct a lot of testing on system integration to ensure security and reliability. As a stream processing system, RisingWave also supports data ingestion and delivery. Users can easily import and export data to and from RisingWave through the
create source and
create sink syntax.
Currently, the upstream and downstream systems supported by RisingWave include Kafka, Redpanda, Pulsar, Kinesis, SQL CDC, PostgreSQL CDC, AWS S3, Iceberg, Hudi, and more. The supported BI tools include Grafana, Metabase, Superset, and more. Users can visit risingwave.dev to learn more about system integration and provide valuable feedback to us.
4. Supporting high concurrency access to persistent data
RisingWave is a database that supports stream processing. This means that RisingWave users can not only use SQL to perform stream processing in the same way as operating a database but also store data directly in RisingWave. RisingWave supports persistent data storage and allows users to perform high-concurrency access to the stored data. This functionality enables RisingWave to directly replace the combination of Flink+Cassandra/Redis/DynamoDB and other systems. Therefore, users can use a single RisingWave instance to support complex stream processing applications, significantly reducing costs and increasing efficiency.
5. Open sourcing the RisingWave Operator
As a cloud-native database, RisingWave made support for Kubernetes deployment a high priority. Shortly after open sourcing the RisingWave project in April 2022, we open sourced the RisingWave Operator project on GitHub. This allows users to deploy RisingWave through Kubernetes quickly and efficiently.
RisingWave Labs Directions in 2023
1. Further enhancing system stability
Improved stability has always been one of the core development directions of RisingWave. In 2023, RisingWave Labs will aggressively enhance stability assurance and actively work with community users to identify potential stability issues.
2. Releasing transparent performance reports
Performance is one of the main goals of many projects. However, RisingWave Labs does not want to pursue performance just for the sake of performance, nor does it want to turn "benchmarking" into "benchmarketing." The RisingWave Labs developers fully understand that the factors that affect performance are diverse and challenging to enumerate. In order to better inform community users about RisingWave's performance, RisingWave Labs will release an open performance test report in early 2023, allowing anyone to quickly understand the performance strengths and weaknesses of RisingWave compared to other stream systems and easily reproduce the results.
3. Improving the performance of join, window, and other operators
In stream processing, join, window, and other stateful operators are complex to implement. Many users have made higher performance demands on RisingWave's join, window, and other functions. In 2023, RisingWave Labs will improve these functions, enrich the types of join and window operators, and conduct stress tests on these operators to ensure the stability and performance of RisingWave in complex scenarios.
4. Enhancing streaming-specific semantics
Unlike traditional batch processing systems, stream processing systems have some stream-specific semantics, such as watermark, exactly-once, and others. As a stream processing system, RisingWave has already supported exactly-once processing. At the same time, RisingWave has partially supported watermark semantics. In 2023, RisingWave Labs will fully support watermark and other commonly used stream processing functionalities and provide documentation and tutorials to help users better understand and adopt stream processing technology.
5. Providing a more user-friendly interface
To operate complex data systems, a user-friendly UI is essential. In 2023, RisingWave Labs will launch a new version of the RisingWave UI, allowing users to use RisingWave more easily.
The rapid development of the RisingWave project is only possible with our community’s consistent support and encouragement. By the end of 2022, RisingWave had been deployed in financial, internet, and entertainment scenarios. We expect to release the use reports of RisingWave in the second quarter of 2023. We will also host in-person events in multiple regions, including the US, Singapore, Europe, and others. In addition, we welcome individual and corporate users to join the RisingWave community and provide valuable feedback to RisingWave. Thank you all, and happy new year!