- Rich and Extensive Connectivity: RisingWave offers native connectors and adapters for various data sources, including databases, message queues, data lakes, APIs, and IoT devices. This capability is essential for timely data processing, as it handles both real-time streaming data and bulk batch data loads. Our purpose-built streaming connectors are equipped with built-in intelligence to detect back pressure, enabling efficient data ingestion from numerous sources in a decentralized manner. This capability not only allows for the ingestion of the most recent data but also supports the reprocessing of older data sets on demand.
- Unified Data Model in SQL: A unified approach requires a common data model and a standard language, to reduce context switching between workloads, and enhance developer productivity. A shared data model also simplifies managing the inherent complexity of diverse data characteristics. RisingWave adopts a standard relational model, enabling the creation of complex data pipelines using standard SQL. This allows SQL query writers to treat their data tables as building blocks, facilitating sophisticated use cases and supporting asynchronously developed pipelines. Even the most complex data pipelines can be constructed using cascading materialized views.
- Composable Data Pipelines: Modern data applications often require multi-stage pipelines with the flexibility to easily inject business logic into event data. First-generation stream processing systems fell short in this area, especially for average data engineers, limiting their usefulness and hindering the transition from batch-oriented to dynamic, real-time systems. RisingWave addresses this challenge by making data pipelines composable, allowing tables and views generated by one query to be seamlessly used as inputs for downstream queries. This composability ensures that software can be adapted to new requirements without the need for extensive rewrites, facilitating the integration of fresh solutions.
- Built-in Serving Layer: In modern applications, real-time insights are often accessed by thousands of users through data-driven apps, such as ride-sharing platforms or financial trading desks. To manage high volume of fast reads, insights must be delivered from a high-speed serving layer, typically an in-memory data store or an operational database. This can introduce extra latency and complexity due to additional data hops. RisingWave addresses this by unifying these design patterns, eliminating the need for a separate serving store. With its memory-first architecture, data is immediately available as soon as the streaming job completes. RisingWave also supports disaggregated compute with dedicated serving nodes for ad hoc queries, ensuring efficient data access.
- Continuous Processing of Live data: Live data has short shelf life. RisingWave utilizes the familiar database concept of materialized views. Traditionally used to accelerate queries by caching results, materialized views in RisingWave are continuously updated to ensure consistently fresh results. Incremental updates are triggered automatically. This removes the tradeoff between speed and freshness in data insights. Additionally, RisingWave manages the entire lifecycle of data, including retention, archival, and purging, to maintain performance and effectively manage storage costs.
- Temporality of Data: The age of data is a crucial aspect that differs for batch and stream processing workloads. In streaming systems, this refers to the capability to process live data continuously within time windows. For batch processing, the focus is on historical data, often supported by features like ‘time travel’ and ASOF joins. RisingWave offers a comprehensive feature set to address both scenarios, ensuring robust handling of data temporality across various workloads. It supports various time windowing strategies, such as tumbling, sliding, and session windows, along with watermarks and temporal filters to handle unbounded data streams effectively.
- Interoperability with Other Systems: A system designed to support a unified data processing model should prioritize common standards over custom solutions. This means embracing widely adopted standards for file and storage formats. Interoperability is a core design principle of RisingWave. As Iceberg and Delta increasingly become the de facto standards for data lakehouse table formats, RisingWave provides robust read and write support for both. In addition to bridging batch and streaming paradigms, RisingWave unifies development and usage patterns across Python, SQL, Java, and JavaScript through a common UDF framework. This enables the easy embedding of custom business logic within data pipelines, facilitating more sophisticated data processing, advanced analytics, and ML inferencing.