Fine-Tuning Performance: Introducing Backfill Order Control for Materialized Views

Fine-Tuning Performance: Introducing Backfill Order Control for Materialized Views

For those of you building with RisingWave, you know that CREATE MATERIALIZED VIEW is the command that powers your real-time data transformations and continuous queries. The initial backfill process, where RisingWave ingests historical data to populate your view, is critical for providing complete and correct results from the very beginning.

However, as many of you have experienced, creating materialized views with complex multi-way joins can sometimes lead to performance hotspots during this backfill phase. Today, we're releasing an enhancement to give you more control and solve a common performance issue: backfill order control.

The Problem: Inefficient Backfills for Joins

Consider a frequent pattern: creating a materialized view that joins a large fact table (like events) with one or more smaller dimension tables (like users or products).

Without explicit guidance, RisingWave might begin backfilling the large fact table first. This can lead to significant inefficiency:

  • Join Amplification & Wasted Work: The system processes millions of rows from the fact table, attempting to join them against dimension tables that are still empty. This results in a storm of failed lookups and intermediate results that need to be retracted later.

  • Update Churn: Once the dimension tables are finally loaded, a wave of updates must be processed to correct the view, putting unnecessary load on the system and delaying how quickly the materialized view becomes available and stable.

The Solution: Explicit Control with backfill_order

To solve this, we've introduced the backfill_order option in the WITH clause of your CREATE MATERIALIZED VIEW statement. This new option allows you to explicitly define the dependency between your tables, ensuring a more logical and efficient backfill sequence.

The syntax is straightforward. By specifying dim_table -> fact_table, you instruct RisingWave to complete the backfill for dim_table before it begins processing fact_table.

How to Use It: A Practical Example

Let's take a materialized view that joins a fact table with two dimension tables, dim and dim2. To ensure the joins are efficient during the initial backfill, you can now write your query like this:

- Assume these tables are already created and have data
CREATE TABLE fact(k INT, d1 INT, d2 INT, v INT);
CREATE TABLE dim(k INT);
CREATE TABLE dim2(k INT);
-- Create the materialized view with a defined backfill order
CREATE MATERIALIZED VIEW m1
WITH (backfill_order = FIXED(dim -> fact, dim2 -> fact))
AS
SELECT fact.v
FROM fact
JOIN dim ON fact.d1 = dim.k
JOIN dim2 ON fact.d2 = dim2.k;

With this configuration, RisingWave will now:

  1. First, backfill the dim and dim2 tables.

  2. Only after they are complete, it will backfill the fact table.

This simple change ensures that when the fact stream is processed, the dimension data is already available for successful joins. The result is a dramatic reduction in wasted work, leading to faster, more predictable creation times for your materialized views and lower resource consumption.

Monitoring and Verification

As always, we provide tools for observability. You can inspect the resulting fragment dependency graph with DESCRIBE FRAGMENTS and monitor the real-time progress using the rw_catalog.rw_fragment_backfill_progress view to see your new configuration in action.

Availability

Backfill order control is available now in the version 2.5 of RisingWave. We encourage you to use it on your complex join workloads and see the performance benefits. For more detailed information, please see the official documentation on Backfill behavior and controls.


Get Started with RisingWave

  • Try RisingWave Today:

  • Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.

  • Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.