In the last blog, we provided an overview of materialized views, including their definition, result freshness, benefits, best practices, and implementation in major databases. Starting today, we will launch a new article series to introduce the specific implementations and features of materialized views in various popular databases, beginning with Amazon Redshift.
What is Redshift?
Amazon Redshift is a data warehouse solution provided by AWS, designed to store and analyze large volumes of analytical data. It uses SQL to analyze both structured and semi-structured data across data warehouses, operational databases, and data lakes. According to AWS, Redshift is fast and scalable, offering a tenfold performance increase over other data warehouses. This is achieved through the use of machine learning and parallel query processing against columnar storage on high-performance disks.
Redshift offers the following benefits:
- Superior price-performance at all scales: It provides a fully managed, AI-powered, Massively Parallel Processing (MPP) data warehouse designed for performance, scalability, and availability.
- Data unification without ETL: Redshift enables easy access to data across various platforms, including data lakes, databases, data warehouses, and streaming data, using a no-code/low-code approach to integrated analytics.
- Value maximization through comprehensive analytics & ML: Redshift supports SQL queries and open-source analytics, powers dashboards and visualizations, and enables near real-time analytics and AI/ML applications with the analytics engines and languages of your choice.
- Accelerated innovation through secure data collaboration: Redshift allows data sharing and collaboration within and across organizations, AWS regions, and even with third-party data sets, without the need for manual data movement or copying. It also ensures detailed governance, security, and compliance.
Amazon Redshift has numerous practical applications. It improves financial and demand forecasts with real-time querying of large data volumes. It supports low latency analytics applications like fraud detection and IoT. Redshift enhances business intelligence by aiding report and dashboard creation with BI tools like Amazon QuickSight, Tableau, and PowerBI. It also facilitates machine learning in SQL for advanced analytics on large datasets. Moreover, Redshift allows data monetization through application construction across databases, data warehouses, and data lakes, promoting secure sharing and potential new revenue.
Views & Materialized Views
What is a view?
A view is a simpler name for a complex query. For example, our data analyst Dan often queried the top 5 best-selling products. To simplify this repetitive task, database administrator Dave suggested creating a view.
CREATE VIEW best_sellers_5 AS
SELECT p.product_name, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.product_name
ORDER BY total_quantity DESC
LIMIT 5;
A view stores the query logic in the database. Now, to access the top products, anyone can simply use:
SELECT * from best_sellers_5;
This initiates the complex query and fetches results.
However, as product numbers and sales increased, the query took longer. Dave then suggested creating a materialized view.
What is a materialized view?
A materialized view stores pre-computed results in a table. Thus, querying a materialized view fetches these results quickly.
However, the results' freshness depends on the materialized view's refresh frequency. If the base tables update daily but the materialized view refreshes every two days, then the results aren't always fresh.
Depending on the database, a materialized view can be refreshed manually or set to a schedule.
When to use materialized views
- The same complex query needs to be repeatedly recalculated over a large amount of data.
- Low end-to-end latency is required but access to up-to-the-moment data is not a critical requirement (for traditional databases but not streaming databases).
- Storage space is not a major concern.
Materialized Views in Amazon Redshift
In data warehouse environments, complex queries on large tables are frequently encountered. These include SELECT statements that perform multi-table joins and aggregations on tables containing billions of rows. The processing of these queries can be resource-intensive and time-consuming.
Amazon Redshift offers materialized views as a solution to these challenges. A materialized view consists of a precomputed result set, derived from an SQL query over one or more base tables. SELECT statements can be issued to query a materialized view, similar to querying other tables or views in the database. This yields precomputed results from the materialized view, bypassing the need to access the base tables, thereby improving query speed.
Materialized views are particularly beneficial for accelerating predictable and repetitive queries. Instead of executing resource-demanding queries against large tables, applications can query a materialized view and obtain a precomputed result set. A typical use case for a materialized view is populating dashboards, such as Amazon QuickSight, where the queries are predictable and repeated frequently.
Benifits
Materialized views in Redshift come with several noteworthy features. Here are a few:
- More than just tables: Redshift allows you to create a materialized view to reference data in external tables (such as AWS S3) or even define one using an existing view. This is useful if you have files in AWS S3 that you want to reference, or if you need a new materialized view with additional data from an existing one.
- Materialized view on materialized view: You can create materialized views based on existing ones in Redshift. This is similar to reading data from a table and helps to avoid duplicating costly table joins and aggregations.
- Adding columns: While there are more DDL (Data Definition Language) limitations on creating materialized views, you can add columns to internal tables without impacting existing materialized views.
- Automatic query rewriting: Redshift automatically rewrites your SQL query to use a materialized view (if one exists), even if you don't explicitly use it, thereby improving performance.
- Incremental refresh: Despite certain limitations, Redshift allows you to perform an incremental refresh on a materialized view, as opposed to a full refresh. This can save you time.
Resources
For more information about query scheduling, see Scheduling a query on the Amazon Redshift console.
For information on how to create materialized views, see CREATE MATERIALIZED VIEW.
For information on how to query materialized views, see Querying a materialized view.
For information on how to refresh materialized views, see REFRESH MATERIALIZED VIEW.
For details about SQL commands used to create and manage materialized views, see the following command topics:
For information about system tables and views to monitor materialized views, see the following topics:
>
>
>
Amazon Redshift’s materialized views support many features, like automatic query rewriting and materialized views on materialized views. Its refresh is incremental refresh with some limitations, both manually and automatically. In summary, Redshift materialized views can save both development and execution time. However, understanding how and when to use them is crucial. Make sure you have a firm grasp of the key areas. > >