Understanding Materialized Views
This article provides an overview of materialized views, including their definition, freshness of results, benefits, best practices, and their implementation in mainstream databases.
Given that materialized views are based on the concept of views, let's first explore what a view is.
What is a view?
A view is essentially a short name for a long query.
Let’s look at the concept from a fictional story. Data analyst Dan was studying which products sold the most in the company catalog. He wrote a query to find the top 5 sellers based on total quantity ordered.
SELECT p.product_name, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY total_quantity DESC
LIMIT 5;
The next day, Dan ran the query again out of curiosity. And the following day as well. He realized this was becoming a regular task. Dan decided to save the query text in his notes app, so he could easily run it in the future.
Meanwhile, Dan's coworker Debra had a similar assignment to identify best sellers. She wrote a query matching what Dan had done.
One morning, Dan and Debra discussed their work over coffee. They discovered they were both repeatedly running the same query for the same data.
That's when the database administrator Dave joined the conversation. He suggested for queries used often, it's helpful to create a view.
CREATE VIEW best_sellers_5 AS
SELECT p.product_name, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY total_quantity DESC
LIMIT 5;
Dave explained that a view allows permanent storage of the query logic in the database. Now anyone needing the top products could simply access the view like this:
SELECT * from best_sellers_5;
This simple SELECT query will kick off the complex query behind the scenes, and results will be fetched.
Let’s see what would happen when we query this view.
In essence, this is similar to running the query from scratch. You just don’t need to remember the query logic.
Now Dan understands what a view means. He and Debra are now leveraging the view best_sellers_5 to get the 5 top sellers. As time goes by, the company has more and more products and the quantity of sales also goes up, Dan and Debra find that it now takes quite a while (let’s say, 5 minutes) to get the results. They discussed this question with Dave. Dave said, why don’t you create a materialized view?
They asked Dave to explain what a materialized view is, and what benefits a materialized view can bring.
What does “materialized” in materialized views mean?
Dave explained, when you materialize a view (that is, when you create a materialized view), you create a concrete table to store its results.
Dave use the previous view best_sellers_5 as an example. “If you create a materialized view, with the same name, for the 5 best sellers, then what you actually get is a table with pre-computed results.”
“When you query this materialized view, the database DOES NOT run the query, but rather, it fetches the results that are currently stored in the table. Therefore, you get the results pretty quickly.”
Here is what would happen when you query the best_sellers_5 materialized view.
Dave ran the query and it took 5 seconds to return results. Both Dan and Debra were impressed.
But something came to Debra’s mind and she asked, if the database is just fetching pre-computed results from the table, are they still fresh?
How fresh are the results in a materialized view?
Dave explained, it depends on when the materialized view is refreshed.
The data in materialized views needs to be refreshed when data in the underlying tables changes, and the frequency of refreshing determines the degree of data freshness.
If both the products table and the orders table have changed yesterday but the materialized view is not refreshed, then the results are from the last time when the materialized view is refreshed (for example, yesterday).
If the base tables are updated every day but the materialized view is refreshed once every two days, then the results in your materialized view are not always fresh.
Depending on what database you are using, you can refresh a materialized view manually or set a refreshing schedule.
Dan and Debra started to have more questions.
Dave, a patient person, said these questions are that easy to answer, and he’ll summarize the answers for them.
And the sections below are what Dave summarized.
When to use materialized views
When not to use materialize views
Overview of materialized views in mainstream databases
Materialized views can be implemented in various popular databases, each with its own specific implementation and features. Here's an overview of the implementation of materialized views in some of the most popular databases:
Database | Materialized View Supported | Refresh Modes | Additional Notes |
---|---|---|---|
Oracle | Yes | Manual or automatic | Oracle (Release 23) offers several types of materialized views for different use purposes, which can be used by query rewrite. With manual and automatic refresh, both the update time and scope can be set with abundant options. |
MySQL | No | N/A | MySQL natively does not support materialized views, but alternative solutions can be used to achieve similar functionality. |
Microsoft SQL Server | Yes | Automatic | The “indexed views” in Microsoft SQL Server (2022) are a concept similar to materialized views. They are created by creating a unique clustered index on the view, and automatically refreshed once the data changes. |
PostgreSQL | Yes, in 9.3+ version | Manual | In PostgreSQL (Version 16.1), the view is populated once created unless WITH NO DATA is used. A refresh locks the entire table unless CONCURRENTLY is used. |
MongoDB | Yes, in 4.4+ version | Manual | MongoDB (Version 7.0) provides a similar concept called on-demand materialized views, which return the results from an aggregation pipeline. They use a $merge or $out stage to update the saved data. |
Snowflake | Yes | Automatic | Materialized views are one of the three most common mechanisms Snowflake uses to view, materialize, and transform data. They are automatically updated when their base table changes. |
Google BigQuery | Yes | Manual or automatic | Materialized views in Google BigQuery have a key feature called smart tuning, by which queries are automatically rewritten to use materialized views. Both manual and automatic refresh are supported. |
Amazon Redshift | Yes | Manual or automatic | Amazon Redshift’s materialized views support many features, like automatic query rewriting and materialized views on materialized views. Its refresh is incremental refresh with some limitations, both manually and automatically. |
Materialized views in RisingWave
As a streaming database, RisingWave has leveraged materialized views in a unique way to power always-on analytics and data transformations for latency-sensitive applications such as alerting, monitoring and trading.
Materialized views on tables and streams
Traditional materialized views in most databases can only be created on tables. It means streaming data needs to go through the ETL process and be stored in the database first. In RisingWave, materialized views can be created based on both tables and streams (via source objects). If you don’t need to store the raw stream data, you can just pick the fields you need, perform the transformations, and only store the results.
Automatically refreshed and incrementally computed
Materialized views in RisingWave are not refreshed at a preset interval or manually. They are automatically refreshed and incrementally computed whenever a new event is received. As soon as a materialized view is created, the RisingWave engine is on the lookout for new (and relevant) events. As the computation is performed only on the newly arrived data, the computing overhead is minimal. Furthermore, RisingWave provides snapshot read consistency, which means that when running a query against materialized views or tables in the same database, the result is guaranteed to be consistent between those tables and materialized views.
Materialized views on top of materialized views
In data analytics, it is often the case that one metric is derived from another metric. In such situations, it is unnecessary to recalculate the derived metric from scratch. RisingWave offers the capability to create materialized views based on existing materialized views.
In stream processing scenarios, Kafka is commonly used as an intermediary to connect different processing logic. The unique feature of MV-on-MV is that it eliminates the need for complex inter-system pipelines, allowing you to cascade your transformational logic seamlessly.
Conclusion
In this article, we explored the concepts of views and materialized views in databases, highlighting their implementation in mainstream systems. We also examined how RisingWave offers real-time materialized views, showcasing the unique advantages this feature provides to users.
If you're interested in how materialized views in RisingWave work, we encourage you to give it a try. Processing event streaming data has never been easier. For any inquiries about RisingWave, feel free to connect with us through our Slack community. To stay updated on our latest developments, consider subscribing to our monthly newsletter or following us on Twitter and LinkedIn.