For those of you building with RisingWave, you know that CREATE MATERIALIZED VIEW
is the command that powers your real-time data transformations and continuous queries. The initial backfill process, where RisingWave ingests historical data to populate your view, is critical for providing complete and correct results from the very beginning.
However, as many of you have experienced, creating materialized views with complex multi-way joins can sometimes lead to performance hotspots during this backfill phase. Today, we're releasing an enhancement to give you more control and solve a common performance issue: backfill order control.
The Problem: Inefficient Backfills for Joins
Consider a frequent pattern: creating a materialized view that joins a large fact table (like events) with one or more smaller dimension tables (like users or products).
Without explicit guidance, RisingWave might begin backfilling the large fact table first. This can lead to significant inefficiency:
Join Amplification & Wasted Work: The system processes millions of rows from the fact table, attempting to join them against dimension tables that are still empty. This results in a storm of failed lookups and intermediate results that need to be retracted later.
Update Churn: Once the dimension tables are finally loaded, a wave of updates must be processed to correct the view, putting unnecessary load on the system and delaying how quickly the materialized view becomes available and stable.
The Solution: Explicit Control with backfill_order
To solve this, we've introduced the backfill_order
option in the WITH
clause of your CREATE MATERIALIZED VIEW
statement. This new option allows you to explicitly define the dependency between your tables, ensuring a more logical and efficient backfill sequence.
The syntax is straightforward. By specifying dim_table -> fact_table
, you instruct RisingWave to complete the backfill for dim_table before it begins processing fact_table.
How to Use It: A Practical Example
Let's take a materialized view that joins a fact table with two dimension tables, dim and dim2. To ensure the joins are efficient during the initial backfill, you can now write your query like this:
- Assume these tables are already created and have data
CREATE TABLE fact(k INT, d1 INT, d2 INT, v INT);
CREATE TABLE dim(k INT);
CREATE TABLE dim2(k INT);
-- Create the materialized view with a defined backfill order
CREATE MATERIALIZED VIEW m1
WITH (backfill_order = FIXED(dim -> fact, dim2 -> fact))
AS
SELECT fact.v
FROM fact
JOIN dim ON fact.d1 = dim.k
JOIN dim2 ON fact.d2 = dim2.k;
With this configuration, RisingWave will now:
First, backfill the
dim
anddim2
tables.Only after they are complete, it will backfill the fact table.
This simple change ensures that when the fact stream is processed, the dimension data is already available for successful joins. The result is a dramatic reduction in wasted work, leading to faster, more predictable creation times for your materialized views and lower resource consumption.
Monitoring and Verification
As always, we provide tools for observability. You can inspect the resulting fragment dependency graph with DESCRIBE FRAGMENTS and monitor the real-time progress using the rw_catalog.rw_fragment_backfill_progress
view to see your new configuration in action.
Availability
Backfill order control is available now in the version 2.5 of RisingWave. We encourage you to use it on your complex join workloads and see the performance benefits. For more detailed information, please see the official documentation on Backfill behavior and controls.
Get Started with RisingWave
Try RisingWave Today:
Download the open-sourced version of RisingWave to deploy on your own infrastructure.
Get started quickly with RisingWave Cloud for a fully managed experience.
Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.
Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.
If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.