Understanding Materialized Views

This article provides an overview of materialized views, including their definition, freshness of results, benefits, best practices, and their implementation in mainstream databases.

Given that materialized views are based on the concept of views, let's first explore what a view is.

What is a view?

A view is essentially a short name for a long query.

Let’s look at the concept from a fictional story. Data analyst Dan was studying which products sold the most in the company catalog. He wrote a query to find the top 5 sellers based on total quantity ordered.


SELECT p.product_name, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY total_quantity DESC
LIMIT 5;

The next day, Dan ran the query again out of curiosity. And the following day as well. He realized this was becoming a regular task. Dan decided to save the query text in his notes app, so he could easily run it in the future.

Meanwhile, Dan's coworker Debra had a similar assignment to identify best sellers. She wrote a query matching what Dan had done.

One morning, Dan and Debra discussed their work over coffee. They discovered they were both repeatedly running the same query for the same data.

That's when the database administrator Dave joined the conversation. He suggested for queries used often, it's helpful to create a view.


CREATE VIEW best_sellers_5 AS
SELECT p.product_name, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY total_quantity DESC
LIMIT 5;

Dave explained that a view allows permanent storage of the query logic in the database. Now anyone needing the top products could simply access the view like this:


SELECT * from best_sellers_5;

This simple SELECT query will kick off the complex query behind the scenes, and results will be fetched.

Let’s see what would happen when we query this view.

In essence, this is similar to running the query from scratch. You just don’t need to remember the query logic.

Now Dan understands what a view means. He and Debra are now leveraging the view best_sellers_5 to get the 5 top sellers. As time goes by, the company has more and more products and the quantity of sales also goes up, Dan and Debra find that it now takes quite a while (let’s say, 5 minutes) to get the results. They discussed this question with Dave. Dave said, why don’t you create a materialized view?

They asked Dave to explain what a materialized view is, and what benefits a materialized view can bring.

What does “materialized” in materialized views mean?

Dave explained, when you materialize a view (that is, when you create a materialized view), you create a concrete table to store its results.

Dave use the previous view best_sellers_5 as an example. “If you create a materialized view, with the same name, for the 5 best sellers, then what you actually get is a table with pre-computed results.”

“When you query this materialized view, the database DOES NOT run the query, but rather, it fetches the results that are currently stored in the table. Therefore, you get the results pretty quickly.”

Here is what would happen when you query the best_sellers_5 materialized view.

Dave ran the query and it took 5 seconds to return results. Both Dan and Debra were impressed.

But something came to Debra’s mind and she asked, if the database is just fetching pre-computed results from the table, are they still fresh?

How fresh are the results in a materialized view?

Dave explained, it depends on when the materialized view is refreshed.

The data in materialized views needs to be refreshed when data in the underlying tables changes, and the frequency of refreshing determines the degree of data freshness.

If both the products table and the orders table have changed yesterday but the materialized view is not refreshed, then the results are from the last time when the materialized view is refreshed (for example, yesterday).

If the base tables are updated every day but the materialized view is refreshed once every two days, then the results in your materialized view are not always fresh.

Depending on what database you are using, you can refresh a materialized view manually or set a refreshing schedule.

Dan and Debra started to have more questions.

Other than being fast in returning results, what other benefits using materialized views can bring?
Are materialized views always helpful?
And what we should pay attention to when using materialized views?

Dave, a patient person, said these questions are that easy to answer, and he’ll summarize the answers for them.

And the sections below are what Dave summarized.

When to use materialized views

The same complex query needs to be repeatedly recalculated over a large amount of data.
Low end-to-end latency is required but access to up-to-the-moment data is not a critical requirement (for traditional databases but not streaming databases).
Storage space is not a major concern.

When not to use materialize views

Queries are exploratory, ad hoc, or infrequent. In these scenarios, the costs outweigh the benefits.
Storage space is limited.
Data in frequently updated - mv needs to be refreshed frequently, which can lead to additional overhead and costs. Note that this is not always true. If incremental computation is used and an economical storage option is selected, the additional overhead and costs can be significantly reduced. See the implementation of materialized views in RisingWave as an example.

Overview of materialized views in mainstream databases

Materialized views can be implemented in various popular databases, each with its own specific implementation and features. Here's an overview of the implementation of materialized views in some of the most popular databases:

DatabaseMaterialized View SupportedRefresh ModesAdditional Notes
OracleYesManual or automaticOracle (Release 23) offers several types of materialized views for different use purposes, which can be used by query rewrite. With manual and automatic refresh, both the update time and scope can be set with abundant options.
MySQLNoN/AMySQL natively does not support materialized views, but alternative solutions can be used to achieve similar functionality.
Microsoft SQL ServerYesAutomaticThe “indexed views” in Microsoft SQL Server (2022) are a concept similar to materialized views. They are created by creating a unique clustered index on the view, and automatically refreshed once the data changes.
PostgreSQLYes, in 9.3+ versionManualIn PostgreSQL (Version 16.1), the view is populated once created unless WITH NO DATA is used. A refresh locks the entire table unless CONCURRENTLY is used.
MongoDBYes, in 4.4+ versionManualMongoDB (Version 7.0) provides a similar concept called on-demand materialized views, which return the results from an aggregation pipeline. They use a $merge or $out stage to update the saved data.
SnowflakeYesAutomaticMaterialized views are one of the three most common mechanisms Snowflake uses to view, materialize, and transform data. They are automatically updated when their base table changes.
Google BigQueryYesManual or automaticMaterialized views in Google BigQuery have a key feature called smart tuning, by which queries are automatically rewritten to use materialized views. Both manual and automatic refresh are supported.
Amazon RedshiftYesManual or automaticAmazon Redshift’s materialized views support many features, like automatic query rewriting and materialized views on materialized views. Its refresh is incremental refresh with some limitations, both manually and automatically.

Materialized views in RisingWave

As a streaming database, RisingWave has leveraged materialized views in a unique way to power always-on analytics and data transformations for latency-sensitive applications such as alerting, monitoring and trading.

Materialized views on tables and streams

Traditional materialized views in most databases can only be created on tables. It means streaming data needs to go through the ETL process and be stored in the database first. In RisingWave, materialized views can be created based on both tables and streams (via source objects). If you don’t need to store the raw stream data, you can just pick the fields you need, perform the transformations, and only store the results.

Automatically refreshed and incrementally computed

Materialized views in RisingWave are not refreshed at a preset interval or manually. They are automatically refreshed and incrementally computed whenever a new event is received. As soon as a materialized view is created, the RisingWave engine is on the lookout for new (and relevant) events. As the computation is performed only on the newly arrived data, the computing overhead is minimal. Furthermore, RisingWave provides snapshot read consistency, which means that when running a query against materialized views or tables in the same database, the result is guaranteed to be consistent between those tables and materialized views.

Materialized views on top of materialized views

In data analytics, it is often the case that one metric is derived from another metric. In such situations, it is unnecessary to recalculate the derived metric from scratch. RisingWave offers the capability to create materialized views based on existing materialized views.

In stream processing scenarios, Kafka is commonly used as an intermediary to connect different processing logic. The unique feature of MV-on-MV is that it eliminates the need for complex inter-system pipelines, allowing you to cascade your transformational logic seamlessly.

Conclusion

In this article, we explored the concepts of views and materialized views in databases, highlighting their implementation in mainstream systems. We also examined how RisingWave offers real-time materialized views, showcasing the unique advantages this feature provides to users.

If you're interested in how materialized views in RisingWave work, we encourage you to give it a try. Processing event streaming data has never been easier. For any inquiries about RisingWave, feel free to connect with us through our Slack community. To stay updated on our latest developments, consider subscribing to our monthly newsletter or following us on Twitter and LinkedIn.

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.