RisingWave Roadmap Q4 2023

One and a half year ago, in April 2022, we open sourced RisingWave, the distributed SQL streaming database. A quarter ago, in July 2023, we released the first official version of RisingWave, RisingWave 1.0, a battle-tested system that can be used in production. More recently, RisingWave 1.3 has been released

As an open-source streaming database released under Apache 2.0 license, the development team behind RisingWave actively collect feedback from users and strives to democratize stream processing: to make it simple, affordable, and accessible.

As a system that has been deployed in production in dozens of enterprises and fast-growing startups, how will RisingWave evolve? We plan to make it transparent and periodically update our roadmap. Here’s what you can anticipate in the future release of RisingWave.

Note that the roadmap is not final, and we will frequently update our roadmap to reflect and the item priority to better serve users.

Short-term goals (within the next 3 months)

Adaptive Scaling Implement adaptive scaling to automatically adjust materialized view parallelism based on the number of CPU cores in the cluster.
Improvements on the Existing External Sinks Optimize performance and improve stability of supported external sinks like Doris, Clickhouse, and Elasticsearch. We’ll also expand supported encoding formats for Kafka sink, including Protobuf, Avro, and the support for Schema Registry.
Iceberg Sink V2 We recently introduced a native integration with Iceberg, which is no longer based on the official Java library. It’s fully rewritten by Rust for performance and stability. We plan to stabilize it in the next few months.
Enhanced Observability Expand system tables and add metrics for stateful operators to provide greater visibility into system health and performance.
Improved Open-source Web UI Enhance RisingWave's open-source web UI with additional system information and monitoring capabilities.
Sink into table Users may want to dynamically union the results of multiple views into a single table. For example, a view may correspond to an department in a company while there can be new departments once in a while. With this feature, users can seamlessly merging data from new views as they are added.
CDC Connection Sharing RisingWave currently creates one CDC connection per table. Each connection will individually consume the replication logs, which consists of transactions not only to the source table, but also other tables in the same database. Therefore, multiple connections will lead to the duplicate consumption and a heavy load on the upstream database. Shared CDC connections can thus reduce the load and improve the stability of CDC.
Recoverable CREATE MATERIALIZED VIEW Persist materialized view progress to allow recovering from failures without losing work already completed.
CDC Transaction Atomicity CDC transactions in RisingWave currently applies by events, which may contain only partial content in a transaction. With the new feature, RisingWave will buffer all CDC events within a transaction until it can be fully applied atomically.
Parallel CDC Snapshot Loading Introduce parallelism during CDC snapshot loading to improve user experience for large upstream tables.

Mid-term goals (within the next 6 months)

SSL/TLS Secured Connection Implement SSL/TLS encryption for client/server communications to enhance security.
Alter Materialized View Add ability to modify existing materialized views.
Session Window Introduce session window functionality for advanced streaming analytics.
MemTable Spill A refresh to a small table could suddenly cause 1k times amplification on write throughput. Such a case typically happens when there is a 10+ way join. A way to mitigate this is to use the local disk as a buffer for the flooded writes, thus avoiding OOM.
Dedicated Computes for Materialized View Creation Some users complained that RisingWave’s materialized view creation is too slow, as it requires a resource-intensive ad-hoc computation. On the other hand, since the streaming (incremental computations) is long-running, it requires less resources at the same time. As a result, it’s possible to allocate dedicated resources for MV creation separately when needed, and deallocate them once finished.
More External Sinks Redshift Sink and Snowflake Sink are in the plan.
Recursive CTE Enable recursive common table expressions (CTE) to traverse hierarchical data like the organizational tree in a company.
Shared Meta Plane Enable RisingWave clusters to share the meta plane, including Etcd (or Postgres in the future), to better utilize compute resources across clusters.

Long-term goals

Optimize analytical query performance on third-party systems like Presto and Trino
GraphQL API To allow retrieving results from RisingWave directly through the browser.
Serverless Compaction Automatically scale Compactor instances in and out to match workload demands in a serverless model.

RisingWave is an open-source streaming database aiming at democratizing stream processing: to make stream processing ease-of-use and cost-efficient. Its development direction is highly influenced by user requests. We would love to hear from the community and update our agenda accordingly. If you have any questions or comments regarding RisingWave's roadmap, please don't hesitate to let us know by commenting here. Your voice will help shape the future of real-time stream processing!