Frequently Asked Questions - RisingWave: Open-Source Streaming Database
RisingWave uses Apache License 2.0. As a real open-source license, Apache License 2.0 allows software developers to alter the source code of existing software’s source code, copy the original source code, or update the source code. Furthermore, developers can distribute any copies or modifications they make of the software’s source code.
RisingWave is well-suited for real-time applications with latency requirements ranging from sub-seconds to minutes. Applications like IoT and network monitoring require sub-second latency, and latency requirements for applications like ad recommendations, stock dashboarding, and food delivery can range from hundreds of milliseconds to several minutes. RisingWave constantly delivers results at low latency and can be a good fit for these applications.
Some applications are not latency sensitive and can tolerate delays of tens of minutes. Some representative applications include hotel reservations and inventory tracking. In these cases, users may consider using RisingWave or traditional OLAP databases such as Apache Druid, Apache Pinot, and ClickHouse. They should decide based on other factors, such as cost efficiency, flexibility, and tech stack complexity.
Streaming ingestion (ETL). Streaming ingestion provides a continuous flow of data from one set of systems to another. Developers can use a streaming database to clean streaming data, join multiple streams, and move the joined results into downstream systems in real time. In real-world scenarios, data ingested into the streaming databases typically come from OLTP databases, messaging queues, or storage systems. After processing, the results are most likely to be dumped back into these systems or inserted into data warehouses or data lakes.
Streaming analytics. Streaming analytics focuses on performing complex computations and delivering fresh results on-the-fly. Data typically comes from OLTP databases, message queues, and storage systems in the streaming analytics scenario. Results are usually ingested into a serving system to support user-triggered requests. A streaming database can also serve queries on its own. Users can connect a streaming database directly with a BI tool to visualize results.
Yes. We’ve conducted extensive stress tests and performance evaluations on both the open-source RisingWave and RisingWave Cloud. These solutions have been successfully deployed in numerous production environments across many companies and proved their reliability and efficiency. Our recommendation is to use either RisingWave Cloud or the latest version of open-source RisingWave for optimal results.
RisingWave Cloud provides all the functionality of RisingWave in a managed cloud deployment. It delivers easy stream processing in the cloud while eliminating the challenges of deploying and maintaining your environment. With RisingWave Cloud, users can connect seamlessly to various upstream and downstream systems on the cloud.
RisingWave Cloud users can decide whether to store data or not. As a database system, RisingWave maintains results in its own storage so that users can access data efficiently. Alternatively, users may sink data from RisingWave to an external stream for storage or additional processing.
Yes. RisingWave persists its stored data. Users can insert data into RisingWave by issuing DML statements like INSERT or directly using connectors.
We recommend using connectors from message brokers or direct CDC from upstream systems for production. Incoming data is stored in RisingWave’s storage along with the checkpoint. To use a connector, users only need to add some extra properties when creating the table. Connector offsets are persisted in checkpoints to achieve data durability and exactly-once delivery.
In the case of a system failure, RisingWave will continue to consume data from the checkpointed offset, ensuring that no data is missing or lost. However, the users are responsible for ensuring that data stored in message brokers is persisted.
DML statements, including INSERT/ UPDATE/ DELETE statements, are beneficial for manually modifying data. However, they may suffer from data loss if the system inadvertently fails before they are persisted. The FLUSH command can explicitly tell them to persist data. In future releases, we plan to provide a write-ahead log (WAL) to ensure zero data loss and better data management.
This blog describes some of the ways that RisingWave stores and uses data.
Yes. RisingWave is fully wire-compatible with PostgreSQL. In other words, RisingWave can be integrated with any system that supports PostgreSQL.
No. As a streaming database system, RisingWave is not meant to support transaction processing.
Yes. RisingWave ensures exactly-once semantics, meaning that every single data event will be processed once and only once, even if a system failure occurs.
RisingWave is a stream processing system. It guarantees consistency and completeness. Specifically:
- RisingWave ensures exactly-once semantics, meaning that every single data event will be processed once and only once, even if a system failure occurs.
- RisingWave supports out-of-order processing. Users can enforce RisingWave to process data events in a predefined order, even if data events arrive out of order.
RisingWave is also a database system. But it is not an OLTP database and cannot be used to process database transactions. RisingWave guarantees that all reads will see a consistent snapshot of the database.
Yes. RisingWave supports strong consistency.
RisingWave is a distributed SQL streaming database, while Apache Flink is a distributed stream processing engine.
Compared to Apache Flink, RisingWave features in ease-of-use and cost-efficiency.
This blog details the differences between RisingWave and Flink.
RisingWave is a distributed and persistent streaming database; Materialize is a single-node, in-memory streaming database.
RisingWave does not support transaction semantics. So, if you are looking for a system to support transactional workloads, there may be better choices than RisingWave.
RisingWave is not a messaging system. RisingWave can ingest data from messaging systems. RisingWave, as a streaming database, focuses on processing data streams. In contrast, messaging systems like Apache Kafka, Apache Pulsar, and Redpanda focus on storing data streams.
RisingWave is a streaming database. It focuses on the result freshness and uses an incremental computation model to optimize latency. In RisingWave, users can define a query in advance, and RisingWave can update the query results incrementally whenever data comes in. Users can determine whether to store the input and output inside RisingWave or to directly deliver to downstream systems. RisingWave can answer ad-hoc queries, but it is not optimized for supporting concurrent user-initiated analytical queries that require long-range scans.
OLAP databases, such as Apache Druid, Apache Pinot, and ClickHouse, are optimized for efficiently answering user-initiated analytical queries. OLAP databases typically implement columnar stores and a vectorized execution engine to accelerate complex query processing over large amounts of data. OLAP databases are best suited for use cases where interactive queries are essential. However, OLAP databases are not optimized for incremental computation, meaning that they can hardly guarantee result freshness. They also lack key features such as windowing functions and out-of-order processing, and hence they are unsuitable for supporting stream processing applications.
There are several differences between a streaming database and a real-time analytical database. A streaming database processes data before storing them; data drive the computation; it focuses on delivering fresh results. A real-time analytics database stores data before processing it; users drive the computation, and it focuses on enabling real-time user interaction.
Streaming databases | OLAP databases | |
---|---|---|
Examples | RisingWave, KsqlDB, Materialize | Apache Druid, Apache Pinot, ClickHouse |
Optimized for | Continuous analytics | Interactive analytics |
Computation model | Incremental computation, event-driven | Full computation, user-triggered |
Sample applications | Continuously monitor “the top 30 longest trips in the last 2 hours” Continuously monitor “the top 10 hottest zip codes for passengers” | Quickly answer “how many users used Uber app yesterday?” Quickly answer “What’s the average mile do Uber drivers drive everyday?” |
Both streaming databases and OLAP databases can support real-time analytics, but stream processing systems emphasize the real-time nature of computational results, while real-time analytical systems emphasize the real-time nature of user interaction. By design, OLAP databases cannot support many stream-processing applications. Here are some examples:
- Streaming ETL. Users may want to continuously join multiple data streams from different sources (e.g., messaging queues, OLTP databases, file systems) and deliver results to downstream systems such as data warehouses and data lakes.
- Continuous monitoring. Users may want to monitor query results continuously.
- Out-of-order processing. Due to network or system issues, data can arrive out of order in many scenarios. However, users may require results to be computed in a predetermined order.
RisingWave Labs was founded in January 2021. The team includes a group of experienced database researchers and practitioners previously employed in pioneering companies such as AWS, Google, Microsoft, Snowflake, LinkedIn, Uber, etc.
Yingjun Wu founded RisingWave Labs, a series-A startup building RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his Ph.D. from the National University of Singapore and was a visiting Ph.D. at Carnegie Mellon University.
RisingWave Labs is based in San Francisco and has employees across the globe.
US Domestic: 95 3rd St, 2nd Floor, San Francisco, CA 94103
International: 36 Robinson Road, Singapore 068882, Singapore
RisingWave Labs is a well-funded series-A startup. It has raised over 40 million USD from top-tier VC funds. Learn more from TechCrunch.
RisingWave is a great choice for developing real-time stream processing applications. It can continuously generate fresh results over streaming data. In most use scenarios, event data is generated from some activity, and some action should be taken immediately. The following are some examples of real-time stream processing applications:
- Fraud and anomaly detection in real-time
- Edge analytics for the Internet of Things (IoT)
- Personalization, marketing, and advertising in real-time
These are some, but not all, of the possible applications of stream processing.
With RisingWave, you define the data you need as materialized views. As new data comes in, RisingWave only performs incremental aggregations as the results for previous events have already been calculated, thus reducing the processing time significantly. We optimized the storage processing logic for complex computations and high-concurrency scenarios to further lower latency. Queries can be processed, and results can be delivered with low latency even in these demanding circumstances.
Results of materialized views are stored in RisingWave. You can issue a query to find out the latest result of a materialized view.
- Collect and transform data from streams
- Create materialized views for the data that needs to be incrementally aggregated
- Query data in RisingWave, including persisted data and data you add or import to RisingWave
- Output data to external streams for storage or additional processing
Building real-time applications leveraging streaming data should not incur operational overhead and become a barrier to entry. RisingWave aims to provide an easy on-ramp for SQL users to begin their stream processing journey.
As a technology, RisingWave is agnostic to industries where it can be used. Any use cases that require low latency can benefit. Based on our experience, we have seen adoption in various industries, including but not limited to Information Technology, Financial Services, Health & Life Sciences, Manufacturing & Industrials, and many others.
With the upcoming beta version, customers gain the following benefits:
- Tracking real-time event data from multiple sources in various formats
- Constructing a stream processing pipeline easily by defining materialized views in PostgreSQL compatible interface
- Allowing users to issue ad-hoc queries over both tables and materialized views
True cloud-native deployment with built-in separation of compute and storage
- Cost savings for ad-hoc scheduled workloads via separation of compute and storage
- Allowing flexible deployment options to start/pause/stop services to run as needed
- Connecting to existing sources on the cloud
Developer Productivity
- A high-level declarative programmatic interface in SQL gives users flexibility and choice on how to interact with the system
- Ease of deployment with an on-premise deployment tool, Docker image, and Kubernetes operator in preview
- Integration with observability tools
Simplify the data stack for small and medium businesses
- Combine stream processing and query serving in a single system
For the CXO
- Cost-effective and easy to deploy and manage
For Data Engineering and Data Ops
- The ease of deployment and management alongside other data platforms
- It is a new solution based on modern infrastructure
For App Developers, Architects, and Data Scientists
- Cloud-native architecture with new implementations from the ground up
- Eliminates the need to learn Java or create system-specific APIs by hiding the complexity of stateful stream processing under its straightforward, elegant, and Postgres-compatible SQL interface
- Provides an easy on-ramp for SQL users
For DBAs and SREs
- Control costs with the ability to shut down instances based on workload usage patterns
- Enterprise-ready security with pre-defined RBAC authorization for sensitive data