Join our Streaming Lakehouse Tour!
Register Now.->

Materialized View

A Materialized View in RisingWave is a database object that stores the precomputed results of a SQL query. Unlike traditional database views, which are virtual and re-evaluate the query each time they are accessed, or traditional materialized views that require manual or scheduled refreshes, RisingWave's Materialized Views are incrementally and automatically updated in real-time as new data arrives in the underlying source streams or tables.

This makes Materialized Views a cornerstone of RisingWave, enabling low-latency querying of complex analytical results derived from high-velocity streaming data.

Core Concept

You define a Materialized View using a CREATE MATERIALIZED VIEW AS SELECT ... statement, much like in a standard SQL database. However, in RisingWave, this statement does not just store the query definition; it also:

  1. Initial Computation: Computes the initial result set of the query based on the existing data in the source streams/tables.
  2. Streaming Dataflow Creation: Translates the SQL query into an internal Dataflow Graph of stateful streaming operators.
  3. Continuous Incremental Maintenance: This dataflow continuously processes incoming changes (inserts, updates, deletes) from the source streams/tables. As changes occur, the dataflow incrementally updates the stored results of the Materialized View with very low latency.
  4. Persistent Storage: The results of the Materialized View are stored durably within RisingWave's State Store (Hummock), making them readily available for querying.

How it Differs from Traditional Views/MVs

  • Traditional Views (Virtual Views):
    • Store only the query definition.
    • Query is re-executed every time the view is accessed.
    • Can be slow for complex queries over large datasets.
  • Traditional Materialized Views (in RDBMS):
    • Store the precomputed result of a query.
    • Require manual REFRESH commands or scheduled refresh jobs to update.
    • Data can become stale between refreshes.
    • Full refreshes can be resource-intensive and slow.
  • RisingWave Materialized Views:
    • Store precomputed results.
    • Automatically and incrementally updated in real-time.
    • Provide consistently fresh results with low latency.
    • Updates are efficient due to Incremental Computation.

Key Advantages in RisingWave

  1. Low Query Latency: Since the results are precomputed and stored, querying a Materialized View is extremely fast, typically involving simple lookups rather than complex computations on raw data. This is ideal for interactive dashboards, real-time analytics, and serving layers.
  2. Data Freshness: Results are kept continuously up-to-date as source data changes, providing access to the latest insights.
  3. Simplified Data Pipelines: Complex stream processing logic, including aggregations, joins (stream-stream, stream-table), windowing, and transformations, can be expressed declaratively using SQL. RisingWave handles the underlying complexity of creating and maintaining the incremental dataflow.
  4. Efficiency: Incremental computation ensures that only the changes in the source data are processed to update the view, significantly reducing computational overhead compared to full recomputations.
  5. Consistency: RisingWave ensures that the state of Materialized Views is consistent with the processed input streams, often providing exactly-once semantics for state updates.
  6. Composability: Materialized Views can be defined on top of other Materialized Views or sources, allowing for the creation of complex, layered data processing pipelines.

Use Cases

  • Real-time Dashboards: Powering dashboards that require up-to-the-millisecond data on key metrics.
  • Live Monitoring and Alerting: Continuously monitoring conditions and triggering alerts based on materialized results.
  • Feature Engineering for ML: Generating real-time features for machine learning models.
  • Serving Layer for Applications: Providing a fast and fresh data source for user-facing applications.
  • Complex Event Processing: Identifying patterns and correlations across multiple data streams.
  • Data Denormalization and Pre-aggregation: Simplifying queries and improving performance by pre-joining and pre-aggregating data.

Example in RisingWave

-- Define a source stream of orders
CREATE SOURCE orders_stream (
    order_id INT,
    product_id INT,
    order_value DECIMAL,
    order_time TIMESTAMP
) WITH (
    connector = 'kafka',
    topic = 'orders',
    properties.bootstrap.server = 'kafka_broker:9092',
    format = 'json'
) ROW FORMAT JSON;

-- Create a Materialized View to calculate total sales per product per minute
CREATE MATERIALIZED VIEW sales_per_minute AS
SELECT
    product_id,
    SUM(order_value) AS total_sales,
    TUMBLE_START(order_time, INTERVAL '1 minute') AS window_start
FROM orders_stream
GROUP BY
    product_id,
    TUMBLE(order_time, INTERVAL '1 minute');

-- Query the Materialized View (results are always fresh and query is fast)
SELECT * FROM sales_per_minute WHERE product_id = 123 ORDER BY window_start DESC;

In this example, sales_per_minute will be continuously updated by RisingWave as new orders arrive in the orders_stream. Queries to sales_per_minute will be fast and reflect the latest data.

Considerations

  • State Storage: Materialized Views store their results, so they consume storage in RisingWave's state store (Hummock). The amount of storage depends on the cardinality and complexity of the view.
  • Update Cost: While incremental updates are efficient, very complex MVs or MVs over extremely high-velocity streams will still incur some computational cost for updates.
  • Schema Changes: Managing schema changes in source data and their impact on dependent Materialized Views requires careful consideration and understanding of RisingWave's schema evolution capabilities.

Materialized Views are a fundamental building block in RisingWave, transforming how developers build and operate real-time data applications by combining the power of SQL with the efficiency of incremental stream processing.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.