Overview
When building event-driven applications, people face two key challenges: 1) handling high data volumes with low latency, and 2) enabling real-time ingestion and transformation of streaming data.
Combining RisingWave's stream processing capabilities and ScyllaDB's high-performance NoSQL database provides an effective solution for building event-driven apps and data pipelines.
What is RisingWave?
RisingWave is a PostgreSQL-compatible database specifically designed for stream processing. It excels at ingesting real-time data streams, performing diverse transformations, and enabling instant querying of results.
What is ScyllaDB?
ScyllaDB is a high-performance, distributed NoSQL database designed to handle large amounts of data and provide low-latency access to applications. It is compatible with the Apache Cassandra data model and protocol, which means it can serve as a simple replacement for Cassandra in many cases. It also offers Amazon DynamoDB-compatible API.
ScyllaDB's low-latency and high-throughput capabilities make it suitable for serving real-time applications that require fast data access, such as online gaming, real-time analytics, or Internet of Things (IoT) applications.
The synergy
RisingWave excels at handling streaming data, including ingestion, joins, and transformations. Complementing this, ScyllaDB serves large amounts of data to real-time applications with extremely low latency.
Together, these two systems form a robust foundation for building event-driven applications or pipelines. RisingWave can process event data as soon as it occurs, while its ScyllaDB sink connector enables exporting the processed data to ScyllaDB in real-time. This integration ensures that the data is readily available for serving queries from real-time applications or pipelines.
Integrating RisingWave with ScyllaDB
We will use an example to demonstrate how you can build event-driven applications using RisingWave and ScyllaDB. Let's consider a scenario of personalized recommendations in e-commerce. By joining the clickstream stream and the product catalog stream, we analyze users’ preferences in real-time and provide personalized recommendations.
A sample clickstream stream looks like this:
{
"user_id": "john_doe",
"item_id": "12345",
"timestamp": "2023-03-08T15:30:00Z"
}
A sample product catalog stream looks like this:
{
"item_id": "12345",
"category": "electronics",
"price": 100,
"timestamp": "2023-03-08T10:00:00Z"
}
Step 1: Ingest real-time data from Kafka into RisingWave
We assume that data for these two streams is packaged into two Kafka topics separately.
Let’s now create two sources in RisingWave to ingest these two streams:
# Create a source for the clickstream stream
CREATE SOURCE clickstream (
user_id VARCHAR,
item_id VARCHAR,
timestamp TIMESTAMPTZ
)
WITH (
type = 'kafka',
kafka_topic = 'clickstream',
kafka_brokers = 'localhost:9092'
);
# Create a source for the product catalog stream
CREATE SOURCE product_catalog (
item_id VARCHAR,
category VARCHAR,
price NUMERIC,
timestamp TIMESTAMPTZ
)
WITH (
type = 'kafka',
kafka_topic = 'product_catalog',
kafka_brokers = 'localhost:9092'
);
Step 2: Create a table in ScyllaDB
Since we want to join the stream and export the data into ScyllaDB in real-time, we need a table in ScyllaDB to hold the joined stream.
CREATE TABLE joined_stream (
user_id TEXT,
item_id TEXT,
timestamp TIMESTAMPTZ,
category TEXT,
price DECIMAL,
PRIMARY KEY (user_id, item_id, timestamp)
);
Step 3: Perform stream joins and export to ScyllaDB
In RisingWave, you export data by creating a sink to the downstream system. You can also include data transformation logic in a CREATE SINK statement. When a sink is created, a continuous job will start. Creating a data processing pipeline can be as easy as two SQL statements: CREATE SOURCE and CREATE SINK.
CREATE SINK joined_stream
SELECT c.user_id, c.item_id, c.timestamp, p.category, p.price
FROM clickstream c
JOIN product_catalog p ON c.item_id = p.item_id;
WITH (
connector='cassandra',
type='append-only',
cassandra.url = '<node1>,<node2>,<node3>',
cassandra.keyspace = '<keyspace>',
cassandra.table = 'joined_stream'
);
For detailed syntax and parameter information, please refer to Sink data from RisingWave to Cassandra or ScyllaDB.
Now ScyllaDB can serve queries from applications or downstream systems.
Resources
- For a runnable integration of RisingWave and ScyllaDB, check out this demo.
- Cost-Efficient Stream Processing with RisingWave and ScyllaDB
- ScyllaDB documentation
- RisingWave use cases
- RisingWave documentation
- For a complete list of integrations that RisingWave supports, check out this page.
>
With only three steps, we've established a seamless continuous pipeline that automatically performs stream joins and ingests the joined data into ScyllaDB. This allows real-time applications to query the data with low latency, thanks to ScyllaDB's high-performance capabilities. What makes this integration truly remarkable is the simplicity of setting up the entire workflow. We used a straightforward use case as an example. With RisingWave, you can effortlessly filter, join, and transform streaming data, while expressing complex transformational logic with ease. We encourage you to explore further and reach out if you have any questions or need support. > >
>
The value of event-driven applications and data pipelines is growing, and having the ability to easily set up the technology stack is a major advantage. The integration of RisingWave and ScyllaDB simplifies this process, allowing you to concentrate on delivering value through real-time data processing and analysis. > >