Materialized View
A Materialized View in RisingWave is a database object that stores the precomputed results of a SQL query. Unlike traditional database views, which are virtual and re-evaluate the query each time they are accessed, or traditional materialized views that require manual or scheduled refreshes, RisingWave's Materialized Views are incrementally and automatically updated in real-time as new data arrives in the underlying source streams or tables.
This makes Materialized Views a cornerstone of RisingWave, enabling low-latency querying of complex analytical results derived from high-velocity streaming data.
Core Concept
You define a Materialized View using a CREATE MATERIALIZED VIEW AS SELECT ... statement, much like in a standard SQL database. However, in RisingWave, this statement does not just store the query definition; it also:
- Initial Computation: Computes the initial result set of the query based on the existing data in the source streams/tables.
- Streaming Dataflow Creation: Translates the SQL query into an internal Dataflow Graph of stateful streaming operators.
- Continuous Incremental Maintenance: This dataflow continuously processes incoming changes (inserts, updates, deletes) from the source streams/tables. As changes occur, the dataflow incrementally updates the stored results of the Materialized View with very low latency.
- Persistent Storage: The results of the Materialized View are stored durably within RisingWave's State Store (Hummock), making them readily available for querying.
How it Differs from Traditional Views/MVs
- Traditional Views (Virtual Views):
- Store only the query definition.
- Query is re-executed every time the view is accessed.
- Can be slow for complex queries over large datasets.
- Traditional Materialized Views (in RDBMS):
- Store the precomputed result of a query.
- Require manual REFRESH commands or scheduled refresh jobs to update.
- Data can become stale between refreshes.
- Full refreshes can be resource-intensive and slow.
- RisingWave Materialized Views:
- Store precomputed results.
- Automatically and incrementally updated in real-time.
- Provide consistently fresh results with low latency.
- Updates are efficient due to Incremental Computation.
Key Advantages in RisingWave
- Low Query Latency: Since the results are precomputed and stored, querying a Materialized View is extremely fast, typically involving simple lookups rather than complex computations on raw data. This is ideal for interactive dashboards, real-time analytics, and serving layers.
- Data Freshness: Results are kept continuously up-to-date as source data changes, providing access to the latest insights.
- Simplified Data Pipelines: Complex stream processing logic, including aggregations, joins (stream-stream, stream-table), windowing, and transformations, can be expressed declaratively using SQL. RisingWave handles the underlying complexity of creating and maintaining the incremental dataflow.
- Efficiency: Incremental computation ensures that only the changes in the source data are processed to update the view, significantly reducing computational overhead compared to full recomputations.
- Consistency: RisingWave ensures that the state of Materialized Views is consistent with the processed input streams, often providing exactly-once semantics for state updates.
- Composability: Materialized Views can be defined on top of other Materialized Views or sources, allowing for the creation of complex, layered data processing pipelines.
Use Cases
- Real-time Dashboards: Powering dashboards that require up-to-the-millisecond data on key metrics.
- Live Monitoring and Alerting: Continuously monitoring conditions and triggering alerts based on materialized results.
- Feature Engineering for ML: Generating real-time features for machine learning models.
- Serving Layer for Applications: Providing a fast and fresh data source for user-facing applications.
- Complex Event Processing: Identifying patterns and correlations across multiple data streams.
- Data Denormalization and Pre-aggregation: Simplifying queries and improving performance by pre-joining and pre-aggregating data.
Example in RisingWave
CREATE SOURCE orders_stream (
order_id INT,
product_id INT,
order_value DECIMAL,
order_time TIMESTAMP
) WITH (
connector = 'kafka',
topic = 'orders',
properties.bootstrap.server = 'kafka_broker:9092',
format = 'json'
) ROW FORMAT JSON;
CREATE MATERIALIZED VIEW sales_per_minute AS
SELECT
product_id,
SUM(order_value) AS total_sales,
TUMBLE_START(order_time, INTERVAL '1 minute') AS window_start
FROM orders_stream
GROUP BY
product_id,
TUMBLE(order_time, INTERVAL '1 minute');
SELECT * FROM sales_per_minute WHERE product_id = 123 ORDER BY window_start DESC;
In this example, sales_per_minute will be continuously updated by RisingWave as new orders arrive in the orders_stream. Queries to sales_per_minute will be fast and reflect the latest data.
Considerations
- State Storage: Materialized Views store their results, so they consume storage in RisingWave's state store (Hummock). The amount of storage depends on the cardinality and complexity of the view.
- Update Cost: While incremental updates are efficient, very complex MVs or MVs over extremely high-velocity streams will still incur some computational cost for updates.
- Schema Changes: Managing schema changes in source data and their impact on dependent Materialized Views requires careful consideration and understanding of RisingWave's schema evolution capabilities.
Materialized Views are a fundamental building block in RisingWave, transforming how developers build and operate real-time data applications by combining the power of SQL with the efficiency of incremental stream processing.
Related Glossary Terms