Running multiple distinct streaming workloads or managing data for different teams within a single RisingWave cluster offers significant efficiency. However, as your data landscape grows, challenges arise. Historically, a failure in one workload could disrupt others due to insufficient isolation. Furthermore, while logically separating data into different databases (e.g., sales_db
, marketing_db
, dev_db
, prod_replica_db
) promotes organization and permission management, it often created barriers when you needed to query across these boundaries for unified insights. Workarounds like data duplication or complex pipelines added cost, complexity, and the risk of stale data.
RisingWave v2.3 addresses these shortcomings comprehensively by introducing a powerful suite of features for managing multi-workload environments. This includes Database Isolation for better baseline resilience, fine-grained control via Resource Groups, and the crucial flexibility to query across boundaries with Cross-Database Queries. Together, these features allow you to organize data logically, isolate resources effectively, and access data seamlessly where it lives.
Enhanced Database Isolation: A More Resilient Baseline
Starting with v2.3, RisingWave incorporates inherent improvements to isolate databases from each other by default:
Independent Checkpoints: Checkpointing, the process of persisting consistent state, now operates more independently per database. Assuming no materialized views explicitly create cross-database dependencies (more on that later), a checkpoint issue triggered by a job in
database_A
is far less likely to impact the checkpoint process fordatabase_B
.Localized Job Error Recovery: If a streaming job fails within a specific database (due to internal errors, UDF issues, etc.), the recovery process is now primarily contained within that database. Unrelated jobs in other databases can continue operating without unnecessary disruption.
These baseline improvements mean that RisingWave v2.3 provides a significantly more stable and resilient environment for running multiple workloads, reducing the cascading impact of localized failures. Databases provide logical separation, and this baseline enhancement strengthens operational isolation. However, by default, they might still share the same underlying pool of compute nodes.
Introducing Resource Groups: Fine-Grained Control and Resilience
For users needing even greater control over fault isolation and resource allocation, RisingWave v2.3 introduces resource groups, a premium feature.
What are Resource Groups?
A Resource Group is essentially a named pool consisting of specific compute nodes (CNs) within your RisingWave cluster. You can define multiple distinct groups, potentially leveraging different types of underlying hardware (e.g., high-CPU, high-memory).
How do They Work?
You can assign a database to a specific resource group upon creation (CREATE DATABASE ... WITH RESOURCE_GROUP = ...
). Once assigned, all streaming jobs belonging to that database will be constrained to run only on the compute nodes within that designated group.
This mechanism unlocks two primary benefits:
Granular Fault Isolation (Limiting the "Blast Radius"): By assigning critical, independent databases to resource groups utilizing distinct sets of CNs, you achieve robust infrastructure fault isolation. If a CN within
resource_group_A
crashes, it will only impact the databases assigned toresource_group_A
. Databases inresource_group_B
(on different CNs) remain unaffected by that hardware failure.Workload Matching & Performance Optimization: Tailor resource allocation to workload needs. Assign CPU-heavy jobs to a
high_cpu_pool
resource group and memory-intensive jobs with large state to ahigh_memory_pool
group, ensuring workloads run on optimal hardware and improving performance and efficiency.
Using and Monitoring Resource Groups
You assign a database to a resource group using the WITH RESOURCE_GROUP
clause during creation:
CREATE DATABASE database_name WITH resource_group = resource_group_name;
You can monitor how resources are distributed and utilized across different groups using built-in system catalogs. rw_resource_groups provides a summary view:
SELECT * FROM rw_resource_groups;
-- Example Output
name | streaming_workers | parallelism | databases | streaming_jobs
---------------+-------------------+-------------+-----------+----------------
rg_fraud | 2 | 8 | 1 | 1
rg_risk | 2 | 12 | 1 | 2
rg_analytics | 1 | 4 | 1 | 1
default | 1 | 4 | 1 | 0
And rw_streaming_jobs
shows which resource group each active streaming job belongs to:
SELECT job_id, name, database_id, resource_group FROM rw_streaming_jobs;
-- Example Output
job_id | name | database_id | resource_group
--------+-----------------------+-------------+---------------
6 | fraud_detector | 1001 | rg_fraud
7 | risk_calc_1 | 1002 | rg_risk
8 | risk_calc_2 | 1002 | rg_risk
9 | mv_fraud_risk_summary | 1003 | rg_analytics
Resource groups provide strong boundaries for compute resources and failure domains. But how do you interact with data across these well-defined separations when needed?
Bridging Boundaries: Cross-Database Queries
While isolating databases (logically) and their compute resources (via resource groups) enhances stability, organization, and performance, the need to access data across these boundaries is common. You might want to join sales data from sales_db
with product usage data from product_analytics_db
or build a unified dashboard.
RisingWave v2.3 introduces Cross-Database Queries, allowing you to directly query tables and materialized views across different databases within the same instance, eliminating the need for data duplication or complex intermediary pipelines.
How it Works: Simple, Intuitive SQL
Accessing objects (like tables or materialized views) in another database is straightforward using the standard three-part naming convention: database_name.schema_name.object_name
.
Let's say you have table1
in database db1
and table2
in database db2
:
-- In db1:
CREATE DATABASE db1; \\\\c db1
CREATE TABLE table1 (id INT PRIMARY KEY, value_d1 VARCHAR);
INSERT INTO table1 VALUES (1, 'data_from_db1'), (2, 'more_db1_data');
-- In db2:
CREATE DATABASE db2; \\\\c db2
CREATE TABLE table2 (id INT PRIMARY KEY, value_d2 VARCHAR);
INSERT INTO table2 VALUES (1, 'data_from_db2'), (3, 'more_db2_data');
Now, while connected to db2, you can directly query table1 in db1:
-- Still connected to db2
SELECT * FROM db1.public.table1 WHERE id = 1;
-- id | value_d1
-- ----+---------------
-- 1 | data_from_db1
-- (1 row)
-- You can even join them directly in an ad-hoc query
SELECT t1.id, t1.value_d1, t2.value_d2
FROM db1.public.table1 t1
JOIN public.table2 t2 ON t1.id = t2.id; -- Assuming table2 is in the current db's public schema
-- id | value_d1 | value_d2
-- ----+---------------+---------------
-- 1 | data_from_db1 | data_from_db2
-- (1 row)
Interaction with Resource Groups
Cross-database queries respect the isolation boundaries established by resource groups for compute execution. When a query is initiated from a connection to database_A
(assigned to resource_group_A
), even if it accesses data from database_B (in resource_group_B
), the compute resources for executing that specific ad-hoc query are managed within the context of the initiating database's environment (resource_group_A
). This maintains resource governance while providing data access flexibility.
Unlocking Cross-Database Materialized Views with Subscriptions
The real power emerges when you create materialized views (MVs) that combine data from different databases, providing continuously updated, unified views. For example, creating an MV in db2
that joins data from db1
and db2
.
However, MVs need to react to ongoing changes in their source tables. For an MV in one database (e.g., db2
) to react to changes in a table from another database (e.g., db1
), it needs access to that table's change history (changelog). This is where CREATE SUBSCRIPTION
becomes essential. It makes the change stream durably available across database boundaries.
Before creating an MV that depends on a table in another database, you must create a subscription for that source table:
-- First, ensure db1.table1's changes are accessible across databases
-- Connect to db1
\\c db1;
-- Note: The source table must have a primary key defined.
CREATE SUBSCRIPTION sub_table1 FROM public.table1 WITH (retention = '1d'); -- Adjust retention as needed
-- Now, switch back to db2
\\c db2;
-- Create a materialized view in db2 joining db1.table1 and local db2.table2
CREATE MATERIALIZED VIEW mv_cross_join AS
SELECT t1.id, t1.value_d1, t2.value_d2
FROM db1.public.table1 t1 -- Source requires subscription `sub_table1`
JOIN public.table2 t2 -- Local table in db2
ON t1.id = t2.id;
-- Query the MV
SELECT * FROM mv_cross_join;
-- id | value_d1 | value_d2
-- ----+---------------+---------------
-- 1 | data_from_db1 | data_from_db2
-- (1 row)
-- Now, if data changes in db1.public.table1 (or db2.public.table2),
-- mv_cross_join will update automatically!
Note: This cross-database MV will create a dependency between the databases regarding checkpoints and job execution, potentially running actors in the MV's resource group that process data originating from another group's source.
Illustrative Scenarios: Putting Isolation and Access into Practice
Let's see how these features work together.
Scenario 1: Preventing Cross-Team Disruption (Baseline Isolation)
Situation: Marketing (marketing_db) and Sales (sales_ops_db) share a cluster. Marketing's experimental UDF fails.
Benefit (v2.3 Baseline): The failure is contained within marketing_db. sales_ops_db continues running smoothly thanks to enhanced baseline isolation.
Scenario 2: High Availability, Performance Tuning & Unified Reporting (Resource Groups + Cross-DB Queries)
Situation: A financial firm runs CPU-intensive fraud detection (
fraud_db
) and memory-intensive risk calculation (risk_db
). Max availability for fraud is paramount. They also need a real-time dashboard joining fraud alerts with risk exposures, potentially managed in a separateanalytics_db
.Solution (v2.3 Premium):
Define hardware pools: high-CPU nodes, high-memory nodes.
Create resource groups:
rg_fraud
(high-CPU),rg_risk
(high-memory), mayberg_analytics
(general purpose).Assign databases:
CREATE DATABASE fraud_db WITH RESOURCE_GROUP = rg_fraud;
,CREATE DATABASE risk_db WITH RESOURCE_GROUP = rg_risk;
,CREATE DATABASE analytics_db WITH RESOURCE_GROUP = rg_analytics;
Fault Isolation: A node failure in
rg_risk
only impactsrisk_db
.fraud_db
is unaffected.Performance: Each critical workload runs on optimal hardware.
Unified Reporting:
- Create subscriptions in
fraud_db
andrisk_db
for the necessary tables (e.g., alerts, daily_exposure).
- Create subscriptions in
-- In fraud_db:
CREATE SUBSCRIPTION sub_alerts FROM public.alerts ...;
-- In risk_db:
CREATE SUBSCRIPTION sub_exposure FROM public.daily_exposure ...;
- In analytics_db, create a cross-database materialized view:
-- In analytics_db:
CREATE MATERIALIZED VIEW mv_fraud_risk_summary AS
SELECT f.alert_id, f.customer_id, f.alert_timestamp, r.exposure_level
FROM fraud_db.public.alerts f
JOIN risk_db.public.daily_exposure r ON f.customer_id = r.customer_id
WHERE f.alert_timestamp > now() - interval '1 day'; -- Example filter
- This
mv_fraud_risk_summary
provides a continuously updated, unified view for the dashboard, drawing data across isolated databases without disrupting their core streaming operations directly (though the MV itself runs withinrg_analytics
). Ad-hoc cross-database queries could also be run directly fromanalytics_db
.
These scenarios demonstrate the layered approach: baseline isolation for stability, resource groups for strong boundaries and performance, and cross-database queries/MVs for flexible, unified data access.
When Do These Features Make the Most Impact?
This combination of enhanced isolation, resource groups, and cross-database querying provides significant advantages in several common situations:
For Robustness and Separation: If you need strong guarantees around isolation and resilience, these features are key. This applies particularly to multi-tenant deployments where separating tenants into different databases and resource groups prevents interference and limits the blast radius of failures. Similarly, critical workloads benefit immensely from dedicated resource groups, ensuring their stability isn't compromised by less critical jobs or hardware issues elsewhere in the cluster. You can also maintain clear separation between development, staging, and production replica environments within the same cluster infrastructure.
For Performance and Resource Management: When performance tuning and efficient resource utilization are priorities, resource groups allow you to match workloads to optimal hardware (e.g., CPU-bound vs. memory-bound jobs) and provide better resource governance by controlling and monitoring consumption per database or application.
For Organization and Simplified Data Integration: If you practice domain-driven design or simply organize data logically across multiple databases (e.g., orders_db, customers_db), cross-database queries eliminate the major headache of accessing that data for unified views. You can avoid costly data duplication and complex intermediary pipelines. Instead, build centralized analytics or data marts using cross-database materialized views that directly query the source data, ensuring consistency and simplifying your overall architecture.
Availability
Database isolation, resource groups, and cross-database queries (including cross-database materialized views via subscriptions) are Premium Edition features available in RisingWave v2.3 and later versions. They are automatically enabled on RisingWave Cloud. For self-hosted deployments, you can try out any Premium Edition feature without license for clusters with a compute capacity up to 4 cores.
Conclusion
RisingWave v2.3 delivers a significant leap forward in managing complex, multi-workload streaming environments. Enhanced database isolation provides a more resilient foundation. Resource groups offer powerful, fine-grained control over compute resources, fault domains, and performance tuning. Layered on top, cross-database queries break down internal silos, providing the essential flexibility to access and unify data across these isolated boundaries via simple SQL or powerful materialized views (enabled by CREATE SUBSCRIPTION
).
Together, these features empower you to build robust, efficient, scalable, and well-organized streaming applications on a shared RisingWave cluster, balancing the need for isolation with the necessity of unified data access.
Upgrade to RisingWave v2.3 to benefit from these enhancements. Explore the documentation to learn more:
Workload isolation and interaction: https://docs.risingwave.com/operate/workload-isolation-interaction
Create Subscription: https://docs.risingwave.com/sql/commands/sql-create-subscription/
Try RisingWave Today
Ready to dive in? Choose the option that works best for you:
Start a free trial of RisingWave Cloud: Our fully managed cloud service.
Test the open-source version locally: Get started quickly on your own machine.
Want to stay connected? Follow us on Twitter and LinkedIn for the latest updates, and join our Slack community to chat with our engineers and hundreds of fellow streaming enthusiasts.