RisingWave vs ksqlDB
Compare ksqlDB and RisingWave by architecture, deployment, scalability, sources and sinks, ecosystem, and developer tooling.
Architecture
Architecture | ksqlDB | RisingWave |
---|---|---|
Data processing model | Stream processing using a SQL-like query language. | Stream processing using a PostgreSQL-compatible SQL. |
Separation of compute and storage | Yes. ksqlDB uses Kafka as its storage layer, and compute is performed in ksqlDB. | Yes. In RisingWave, compute nodes, compactor nodes (optimized state store), and object storage can be configured and scaled separately. |
State backend | ksqlDB relies on Kafka for state storage, and Kafka itself uses RocksDB as the underlying storage engine. | Native data store that leverages tiered storage for optimized performance |
Data consistency and processing guarantees | Provides at-least-once and exactly-once processing semantics via Kafka Streams. | Support exactly-once semantics, out-of-order processing, and snapshot read |
Fault tolerance and recovery | Fault tolerance through Kafka Streams, which uses Kafka for state storage and recovery. Checkpointing is not supported. In the case of a failure, retrieving a large state from Kafka can be expensive. | Advanced fault tolerance mechanisms, including checkpointing and consistent snapshots for fast and consistent recovery. |
High availability | Supported | Supported |
ksqlDB is a stream processing SQL engine developed and maintained by Confluent, designed specifically for Apache Kafka. It is built on top of Kafka Streams, a client library for creating stream processing applications on Kafka topics. ksqlDB operates under the licensing terms of the Confluent Community License Agreement.
RisingWave is an open-source distributed SQL streaming database that offers a reliable and efficient solution for processing and managing streaming data. It’s open-sourced under the Apache License V2.0. RisingWave utilizes PostgreSQL as its interface for managing and processing streaming data. It excels in performing complex computations on unbounded data streams.
Deployment and scalability
Deployment | ksqlDB | RisingWave |
---|---|---|
Deployment dependency | 1. A Java runtime environment. 2. Access to a Kafka cluster for reading and writing data in real-time. 3. Optional: If you need to ingest streaming data from non-Kafka sources, you need a separate Kafka Connector cluster | No dependency |
Deployment options | - Self-hosted on-premises or on Cloud. - SaaS provided by Confluent. As ksqlDB is dependent on Kafka, you need to deploy it alongside a Kafka cluster. Therefore the deployment mode of ksqlDB depends on how you deploy the Kafka cluster. | - Self-hosted on-premises or on cloud infrastructure. - SaaS - RisingWave Cloud is a managed service provided by RisingWave. |
Vertical scaling | Manually add more capacity per server. | Manually allocate more resources to a node. Resources can be configured by node. Automatic scaling is work in progress. |
Horizontal scaling | Manually add more servers to the cluster, with partitions used as the basis for scaling the workload. | Manually add more nodes to the cluster. Automatic scaling is work in progress. |
ksqlDB is tightly integrated with Kafka and is built on top of it. To deploy ksqlDB, you will require a Java runtime environment and access to a running Kafka cluster. If you intend to use Kafka Connect for ingesting and delivering event streams, you will also need a Kafka Connect cluster. Scaling in ksqlDB can be done both vertically and horizontally.
RisingWave has no deployment dependencies. It can be deployed without any additional requirements. Scaling in RisingWave is flexible and can be configured to a single node. Similar to ksqlDB, RisingWave supports both vertical and horizontal scaling. Automatic scaling is in progress and will be available soon.
Data sources and sinks
Sources and sinks | ksqlDB | RisingWave |
---|---|---|
Supported sources | ksqlDB primarily integrates with Apache Kafka as both a data source and sink. It accepts a wide range of sources via Kafka Connect. | - Apache Kafka - Confluent Cloud - Amazon MSK - Redpanda - Apache Pulsar - DataStax Astra Streaming - Kinesis Data Streams - NATS / NATS JetStream - PostgreSQL CDC - Citus CDC - MySQL CDC - MongoDB - SQL Server - TiDB |
Supported source formats | - JSON & JSON_SR - Avro - Protobuf - Kafka - DELIMITED | - JSON - Avro - Protobuf - CSV |
Data operations for sources and sinks | - Append-only | - Append-only - Upsert |
Supported sinks | ksqlDB primarily integrates with Apache Kafka as both a data source and sink. It can sink data into downstream systems via Kafka Connect. | - Apache Doris - Apache Kafka - Apache Iceberg - Apache Pulsar - Kinesis - Cassandra - ClickHouse - CockroachDB - Delta Lake - Elasticsearch - BigQuery - My SQL - NATS - PostgreSQL - Redis - StarRocks - TiDB |
ksqlDB ingests data from and sinks data to Apache Kafka. It can also integrate with external systems using Kafka Connect. It supports common serialization formats like JSON, Avro, and Protobuf.
In contrast, RisingWave integrates with a wider variety of data sources and sinks beyond Kafka. These include databases, messaging systems, and data lakes. It allows both appending new records and upserting existing ones, offering more flexible data operations than ksqlDB.
Ecosystem and tooling
Ecosystem and tooling | ksqlDB | RisingWave |
---|---|---|
Ecosystem | Tight integration with the Kafka ecosystem. | RisingWave is wire-compatible with PostgreSQL, and therefore integrates seamlessly with the Postgres ecosystem. |
Developer tooling | - Official client in Java, and community-contributed client in .NET, Go, and Python. - The ksqlDB CLI is a command-line interface for the ksqlDB engine, allowing users to interact with ksqlDB Server. - REST API for interacting with ksqlDB programmatically | - Client libraries in Python, Java, Node.js, and Go. Third-party PostgreSQL drivers can be used for other languages. - JDBC driver - dbt for streaming transformations - Bytebase for schema management - GraphQL for interacting with RisingWave programmatically |
Visualization | Superset | - Superset - Metabase - Beekeeper Studio - DBeaver - Grafana - Looker - Supabase - Redash |
ksqlDB is a component of the Confluent and Apache Kafka ecosystem, allowing users to leverage various Kafka Connect connectors using SQL syntax. It is important to note that ksqlDB is solely owned by Confluent and is made available under the Confluent Community License. Some concerns have been raised regarding the future of ksqlDB, as Confluent has indicated a potential shift towards positioning Flink as its preferred stream processing engine in the future.
RisingWave is compatible with PostgreSQL, allowing it to integrate with numerous data engineering, visualization, and analytics systems that already support PostgreSQL integration. Furthermore, RisingWave is actively developing more integrations with different data systems, with the goal of benefiting an even larger user base.
Conclusion
Both ksqlDB and RisingWave are user-friendly and powerful stream processing systems, each with its own strengths and areas of focus. If your technical stack is already tightly integrated with the Java and Kafka ecosystem, and your workload primarily consists of joins, filtering, and small windowed aggregations, ksqlDB would be a great choice for you.
On the other hand, if you are starting to build real-time pipelines and prefer not to be locked into any specific ecosystem, or if cost-efficiency is a significant consideration for you, RisingWave would be a better fit. The choice between the two systems depends on your specific use case, requirements, and familiarity with the respective technologies.