RisingWave and Apache Flink SQL: A Comparison
Ease of use, cost-efficiency, and correctness — in this article, we provide a comprehensive explanation from the users' point of view focusing on three crucial factors.
RisingWave is a distributed SQL streaming database designed to reduce the complexity and cost of building real-time applications. The database is open-source and can be accessed publicly on GitHub under the Apache 2.0 License. RisingWave Cloud provides all the functionality of RisingWave in a managed cloud deployment. It delivers easy stream processing in the cloud while eliminating the challenges of deploying and maintaining your environment.
In the field of stream processing, Apache Flink is one of the most popular open-source frameworks. Flink provides a set of low-level APIs that allow users to write complex stream processing programs in high-level programming languages such as Java, Scala, and Python. Flink further introduced SQL interface to lower the bar of application development.
Recently, several companies, including AWS and Confluent, are investing in developing a managed Flink SQL service to eliminate the pains of deploying and maintaining open-source Flink. Given that RisingWave and Flink SQL provide SQL-based interfaces for supporting stream processing applications, such as fraud detection, real-time dashboarding, and alerting, it's natural to wonder about the distinctions between these technologies.
This article aims to provide a comprehensive explanation from the users' point of view, specifically focusing on three crucial factors: ease of use, cost-efficiency, and correctness.
The ease of use of RisingWave surpasses that of Flink SQL, thanks to its PostgreSQL compatibility and database-oriented architecture, as opposed to Flink SQL being a stream processing engine.
- RisingWave is compatible with PostgreSQL, so users can use RisingWave the same way they would. This allegiance not only means that users are familiar with RisingWave's SQL syntax, but they can also use standard database DDL (such as create/drop roles/users/schemas/tables/views) and DML (such as update, insert, delete) to manage data.Flink uses big data style SQL, with many custom syntax rules, requiring users to refer to Flink documentation to learn it. Furthermore, Flink is not a SQL database and does not support database DDL and DML; hence, users cannot manipulate internal data flexibly.
- Since RisingWave is wire-compatible with PostgreSQL, it can connect to popular data sources like Apache Kafka and Apache Pulsar and any systems or tools that can connect with PostgreSQL as Grafana, DBeaver, and others, allowing it to take full advantage of the convenience brought by the PostgreSQL ecosystem. Users can even use the PostgreSQL editor plugin in Visual Studio Code to compose SQL code, significantly improving their development efficiency.Although Flink integrates with various big data systems, it cannot integrate with database management tools, making it less convenient than systems like RisingWave, which are compatible with PostgreSQL.
- RisingWave is a database where all input and output data are presented as tables or materialized views. Users can easily query both input and output data in RisingWave, enabling them to access, edit, verify, and debug their streaming queries anytime.In contrast, Flink does not offer a straightforward means of accessing internal data, making it challenging for users to access their queries and verify query correctness.
- As a database, RisingWave can store results and support users to query them randomly without importing them into downstream systems.Flink is a stream processing engine that cannot store data. It can only export data to downstream systems, and users must query the computation results in the downstream system.
- The efficient architecture and implementation of RisingWave make it a cost-efficient solution for stream processing, being 2-15x lower in total cost compared to Flink SQL using industry-standard benchmark (performance report coming soon in Q2 2023). Additionally, its data storage capabilities offer users added flexibility and cost savings in building their streaming applications.
- RisingWave adopts a decoupled compute-storage architecture like many modern cloud databases. The computing and the cloud storage layers of RisingWave can be configured differently and scaled independently, leading to improved cost efficiency and performance. This architecture also allows each component to be optimized separately, reducing resource waste and avoiding task overload.Flink adopted the conventional coupled compute-storage architecture, which enables it to achieve high parallelism and scalability while resulting in high execution costs.
- RisingWave was architected from the ground up with a clear focus on database technologies. It actively harnesses modern database techniques to boost the efficiency of its optimizers, executors, and storage elements.Flink assembles its SQL layer on the foundation of its MapReduce-style computation framework, neglecting the opportunity for various optimizations specific to databases.
- RisingWave is implemented in Rust, a programming language known for its safety and performance. Rust offers developers low-level control over system resources. Its ability to eliminate runtime overhead, optimize code at compile-time, and provide efficient memory management, thus giving a significant performance advantage.Flink's Java-based implementation makes it prone to runtime overhead and garbage collection pauses, which can hinder the performance of real-time stream processing.
- The internal storage system of RisingWave provides the advantage of directly storing tables within the database, streamlining access and enhancing performance for real-time applications. This reduces the frequency of external data access, leading to faster query execution and retrieval.Flink SQL lacks an internal storage system, which may incur frequent accesses to external storage and introduce additional latency and cost for query processing.
- RisingWave supports layered materialized views, which optimizes query performance by reusing computing resources and minimizing redundant computations. These precomputed query results are stored for quick retrieval without re-executing the query.Conversely, Flink SQL treats each query independently with no resource sharing between queries. While this allows for concurrent query execution, it may result in recomputing intermediate results for different queries, leading to potential inefficiencies.
- Integrated storage in RisingWave enhances stream processing by allowing users to access and query data stored within its infrastructure directly, eliminating time-consuming data transfer steps. This capability streamlines data processing workflows and enables real-time analytics within RisingWave.In contrast, Flink's lack of internal storage means that users must deliver the results of Flink's processing into downstream systems and query the data within those external systems, introducing additional complexity and potential delays to the data processing workflow.
RisingWave goes beyond Apache Flink in terms of ensuring the correctness of results delivered to downstream systems. While both platforms guarantee consistency and completeness in stream processing, RisingWave offers additional benefits, such as consistent snapshots for all data access, which ensures that users always see accurate and unambiguous results.
The definition of correctness in stream processing includes consistency and completeness, which refer to：
- Consistency. Every single data event will be processed once and only once, even if any system failure occurs. This is also known as exactly-once semantic.
- Completeness. Even if a data stream arrives out of order, the results will ultimately be in order.
Both Apache Flink and RisingWave achieve these goals by employing a consistent checkpoint algorithm and using watermarks to detect out-of-order data. However, RisingWave is a stream processing platform and a streaming database, which means it can guarantee consistent snapshots for all data access. This additional feature ensures that users always see accurate and unambiguous results without any confusion.
In summary, RisingWave and Apache Flink SQL present notable solutions for SQL-based stream processing, with RisingWave standing out for its ability to simplify development and reduce costs compared to Apache Flink SQL. As users evaluate their options, the ease of use, cost-efficiency, and correctness of RisingWave make it a compelling choice for those seeking innovation and efficiency in their real-time data processing endeavors.