RisingWave: Open-Source Streaming Database

TL;DR.

curl https://risingwave.com/sh | sh

In early 2021, I founded RisingWave with the goal of driving mainstream adoption of stream processing technology. Over the past three years, I have been tirelessly evangelizing RisingWave, the SQL streaming database, to the market, hoping it can gain a strong foothold in the stream processing arena. Through persistent efforts, RisingWave has now been embraced by hundreds of companies across multiple domains, including world-leading enterprises in finance, manufacturing, security, aerospace, and other industries.

Providing exceptional services to large enterprises is always exciting. However, I've continually pondered one question: Why is it always big companies? Why not small and medium companies (SMB) or individuals? Is it viable to design stream processing technology centered around developers? In other words, can individual developers or SMBs truly harness the powerful capabilities of stream processing?

The Democratization Trend in the Data Systems Market

"Making stream processing technology accessible to all" wasn't merely an idea born out of thinking outside the box. If we examine today's data systems market, we'll notice database services exhibiting a pronounced trend of moving downmarket towards the developer segment. Like any other advanced technology's evolution, databases are typically first adopted by leading tech firms, then gradually embraced by the general public, especially individual developers and SMBs, over time.

This democratization process requires two crucial prerequisites:

From the user's perspective, there must be sufficient demand from developers and SMBs;

From the product's perspective, sufficiently user-friendly and affordable products must exist in the market.

For the OLTP database realm, democratization has already materialized: from building personal websites to large-scale services, whenever users need data storage, they may opt for an OLTP database. Mature databases like PostgreSQL and MySQL have long been deployed in production by countless companies. Meanwhile, cloud SaaS offerings like Supabase and Neon have brought PostgreSQL to the cloud, providing a suite of tools that allow developers to build applications for just a few tens of dollars per month.

For OLAP databases, technological democratization is also rapidly progressing. When users need to perform data analytics and reporting, operational databases like PostgreSQL become insufficient for the task. Data warehousing systems like Snowflake and Redshift are costly and complex for developers. However, modern OLAP databases like ClickHouse can deliver efficient data analytics even in single-node environments. The startup Tinybird has built a serverless analytical platform based on ClickHouse, allowing everyone to perform data analysis at a very low cost.

In both the OLTP and OLAP database arenas, we have witnessed similar situations. Stream processing has undergone over twenty years of development. However, throughout this evolution, we do not seem to have seen any stream processing product truly penetrate the developer market and become widely adopted by individual developers and SMBs. If we look at some popular stream processing products, whether Spark Streaming or Flink, they are currently more applied in technology companies with sizable engineering teams. For individuals or small teams, deploying and using such big data components is clearly a high barrier. One can imagine that when a small team is working overtime to build an initial product prototype, they would hardly have time to consider how to use a stream processing framework. All they want is a plug-and-play tool.

Enterprise-grade stream processing systems have become increasingly saturated, while stream processing systems tailored for developers seem to still be underexplored.

Reflecting on this, we can easily see that in the stream processing field, enterprise-grade stream processing systems have become increasingly saturated, forming a fiercely competitive red ocean market. In contrast, stream processing systems tailored for developers seem to still be an underexplored blue ocean. This leads us to wonder, at this point in 2024, is the timing ripe for the democratization of stream processing?

The survey showed that 35.7% of people feel there are no easy-to-use stream processing systems: https://twitter.com/YingjunWu/status/1766702743157342380.

Democratizing Stream Processing: From the User's Perspective

In the current technology landscape, the applications of stream processing are becoming increasingly widespread, but the key question is: Do individual developers and SMBs truly need stream processing technology? Fundamentally, the applicable scenarios for stream processing must simultaneously meet two core conditions:

Data must be continuously ingested into the system in the form of streams;

Users need to analyze data streams to extract real-time insights.

The sources of data streams in the world are still relatively broad, including but not limited to:

User behavior logs on websites (page visits, clicks, etc.)

IoT device data

Social media data

Financial transaction data

E-commerce orders, payment data

In these domains, if developers aim to develop new products that require timely analysis of data, then stream processing technology may be needed. For example, if we want to monitor the volatility range of a stock from trading data, or analyze anomalies from data transmitted by electronic devices, those would be excellent use cases for stream processing.

But what about real-time data synchronization and real-time ETL scenarios - can developers also leverage those? Although they have some appeal for developers, the barrier may still be relatively high. This is because real-time data sync or ETL typically involves at least two systems collaborating; after introducing a stream processing system, developers would need to maintain three sets of systems concurrently. For individual developers or small startup teams, such a cost is quite high, so this may not be an ideal application scenario.

Overall, I believe that from the user's perspective, the precondition for the democratization of stream processing holds true.

Democratizing Stream Processing: From the Product's Perspective

Next, let's analyze it from the product's perspective. Whether individual developers or small development teams, they all hope to focus their efforts on rapidly developing and iterating their products, rather than researching underlying data system architectures. Data systems are positioned as tools, and tools are meant to provide better and faster means of solving problems.

When developing a data system tailored for developers, I believe it needs to meet the following characteristics:

Can be deployed on a single node, without dependencies on Docker or Kubernetes, and embedded deployment would be even better;

Can provide an all-in-one solution, not requiring all capabilities to be the best, but providing various capabilities - users always want to simplify architectures rather than stack multiple systems;

Simple and easy to use, with an extremely low barrier to entry;

Integrated with various other developer tools.

If we look for a stream processing system based on these standards, it's clear that our options in the market are extremely limited. Fortunately, RisingWave happens to be one of the very few stream processing systems that can meet all these criteria.

Stream Processing vs Batch Processing

We have discussed so much, but ultimately we cannot avoid an oft-discussed question: Why do we need stream processing? Isn't batch processing good enough? There are quite a few batch processing systems (especially OLAP databases) in the market that already support real-time data ingestion, so why not just use a batch system directly? I believe that from at least three aspects, stream processing systems have unique advantages:

For some applications, there are strong requirements for low-latency results. For scenarios like financial transactions and fraud detection, users often need system response times in the range of seconds or even milliseconds. For such applications, stream processing may be the only solution. The materialized view functionality available in OLAP databases can address some requirements, but when facing complex queries with large states, it may fall short.

For some applications, the benefits of incremental computation are significantly greater than recomputation. Stream processing uses an incremental computation model, which can greatly avoid unnecessary recomputation, thereby substantially improving computational efficiency.

For some applications, the thinking approach of stream processing is more intuitive. For scenarios like IoT and finance that have strong requirements for computation order, using stream processing is more aligned with normal thought processes, while batch computation can even seem "counter-intuitive." For example, if we want to continuously monitor the average price of a stock over the past 10 minutes, obviously the stream processing approach would be more easily accepted.

When a user's use case satisfies one or more of these three aspects, I believe the user may be more inclined to use a stream processing system.

The Design Philosophy of RisingWave Standalone Edition

The future is always full of unknowns. But rather than waiting for answers, why not seek them ourselves? In the latest 1.7 version of RisingWave, we have launched the RisingWave Standalone edition tailored for developers. We hope this version can allow RisingWave to reach a wide range of developers, letting them enjoy the value brought by stream processing on their local machines.

The design philosophy behind RisingWave Standalone is quite simple - "simplicity".

Simple Installation and Deployment

One of the biggest features of RisingWave Standalone is its extremely simple installation and deployment. Developers can install RisingWave on their local computers (Mac or Ubuntu) with just a single command line:

curl <https://risingwave.com/sh> | sh

Users also don't need to use Kubernetes or Docker containers, truly achieving bare-metal installation. For users concerned about program size, we also provide some compilation options to compress the program down to around 140MB. If this size is still unsatisfactory, feel free to contact us to discuss even smaller packaging options.

All-In-One

Unlike traditional stream processing systems like Flink or Spark Streaming, RisingWave comes with built-in storage capabilities. This means users no longer need to find a so-called "downstream database" to store stream computation results. While supporting stream processing, RisingWave also supports ad-hoc querying of data stored internally. For users, RisingWave essentially implements the full stack from computation to storage to serving.

Extremely Low Barrier to Entry

RisingWave is compatible with PostgreSQL syntax. Users can write SQL statements directly to perform stream processing, without needing to learn APIs of languages like Java/Scala, and without needing to understand internal system details like checkpoints or savepoints.

Rich System Integrations

RisingWave supports integration with dozens of commonly used systems and management tools. For databases frequently used by developers like MySQL, PostgreSQL, and MongoDB, RisingWave can connect with just one statement, eliminating the need for intermediate message queues or other components. Thanks to the PostgreSQL ecosystem, RisingWave can also seamlessly integrate with visualization, management, modeling tools like Grafana, Superset, DBeaver, dbt, and more, greatly enhancing the user experience.

Afterword

The RisingWave Standalone edition carries our vision for the democratization of stream processing – it is our exploration into the future development of stream processing technology. Exploring the unknown naturally entails risks and challenges. We sincerely hope readers can provide more support, and join hands with us to explore the boundaries of the future!

Building a Stream Processing Platform for Developers