MongoDB Materialized Views: An Overview

MongoDB Materialized Views: An Overview

To gain an in-depth understanding of materialized views, consider reading the first article in this series. Today, we will discuss the specific implementations and features of materialized views in MongoDB, a renowned document database. It provides a similar concept called on-demand materialized views.

What is MongoDB?

MongoDB is a document database designed on a horizontal scale-out architecture. This structure enables numerous small machines to collectively build fast systems and manage vast amounts of data. Unlike SQL databases, which store data in rows or columns, each record in a MongoDB database is a document represented in BSON, a binary form of the data. This information can then be retrieved in a JSON format by applications.

Document databases offer high flexibility, accommodating variations in document structures and storing partially completed documents. A document can have other documents embedded within it. Fields in a document function similarly to columns in a SQL database and can be indexed to enhance search performance.

What is on-demand materialized views?

MongoDB provides two varieties of views: standard views and on-demand materialized views. Each type produces results using an aggregation pipeline.

  • The results of standard views are calculated at read time and don't have a footprint on the disk.
  • On the other hand, on-demand materialized views are pre-computed outputs of an aggregation pipeline that are stored and accessed from the disk.

On-demand materialized views offer improved read performance compared to traditional views, as they're read from disk rather than calculated within the query. This advantage grows with the complexity of the pipeline and the volume of the data being summarized. Additionally, traditional views utilize the indexes of the base collection. Consequently, it's not possible to directly create, remove, or reconstruct indexes on a standard view, or to obtain an index list for the view. On-demand materialized views, being disk-stored, allow for direct index creation.

When to use on-demand materialized views in MongoDB?

In the fast-paced world of data management, quick and efficient data access and processing are key. Materialized Views shine in this aspect. So, when should you opt for them? The materialized view is useful in the below scenario:

  • The underlying base table of the Materialized View does not change frequently.
  • The query is on an external table (i.e. data sets stored in files in an external stage), which might have slower performance compared to querying native database tables.
  • Query results contain results that require significant processing, including: analysis of semi-structured data and aggregates that take a long time to calculate.
  • If a query is based on a large table, but you frequently only need a few rows or columns based on certain criteria, an MV can be beneficial.

How to use on-demand materialized views in MongoDB?

Assume that the collection fruits contains sales data broken down by item around the end of January 2024:

db.fruits.insertMany([
{ date: new ISODate("2023-12-01"), item: "Apple", quantity: 2, amount: new NumberDecimal("60") },
{ date: new ISODate("2023-12-02"), item: "Banana", quantity: 5, amount: new NumberDecimal("90") },
{ date: new ISODate("2023-12-02"), item: "Orange", quantity: 10, amount: new NumberDecimal("200") },
{ date: new ISODate("2023-12-04"), item: "Peach", quantity: 20, amount: new NumberDecimal("80") },
{ date: new ISODate("2023-12-04"), item: "Cherry", quantity: 1, amount: new NumberDecimal("16") },
...
{ date: new ISODate("2024-01-28"), item: "Grape", quantity: 5, amount: new NumberDecimal("100") },
]);

Define the on-demand materialized view

The following updateMonthlySales function defines a monthlyfruitales materialized view that contains the cumulative monthly sales information. In the example, the function takes a date parameter to only update monthly sales information starting from a particular date.

updateMonthlySales = function(startDate) {
   db.fruitsales.aggregate( [
      { $match: { date: { $gte: startDate } } },
      { $group: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, sales_quantity: { $sum: "$quantity"}, sales_amount: { $sum: "$amount" } } },
      { $merge: { into: "monthlyfruitsales", whenMatched: "replace" } }
   ] );
};
  • The $match stage filters the data to process only those sales greater than or equal to the startDate.
  • The $group stage groups the sales information by the year-month. The documents output by this stage have the form: { "_id" : "<YYYY-mm>", "sales_quantity" : <num>, "sales_amount" : <NumberDecimal> }
  • The $merge stage writes the output to the monthlyfruitsales collection.

Perform Initial Run

For the initial run, you can pass in a date of new ISODate("1970-01-01"):

updateMonthlySales(newISODate("1970-01-01"));

After the initial run, the monthlyfruitsales contains the following documents; i.e. db.monthlyfruitsales.find().sort( { _id: 1 } ) returns the following:

{ "_id" : "2018-12", "sales_quantity" : 41, "sales_amount" : NumberDecimal("506") }
{ "_id" : "2019-01", "sales_quantity" : 86, "sales_amount" : NumberDecimal("896") }

Refresh on-demand materialized view

Assume that by the first week in February 2024, the fruitsales collection is updated with newer sales information; specifically, additional January and February sales.

db.fruitsales.insertMany( [
   { date: new ISODate("2024-01-28"), item: "Kiwi", quantity: 3, amount: new NumberDecimal("90") },
   { date: new ISODate("2024-01-28"), item: "Plum", quantity: 2, amount: new NumberDecimal("32") },
   ...
   { date: new ISODate("2024-02-03"), item: "Guava", quantity: 5, amount: new NumberDecimal("100") }
] )

To refresh the monthlyfruitsales data for January and February, run the function again to rerun the aggregation pipeline, starting with new ISODate("2024-01-01").

updateMonthlySales(newISODate("2024-01-01"));

The content of monthlyfruitsales has been updated to reflect the most recent data in the fruitsales collection; i.e. db.monthlyfruitsales.find().sort( { _id: 1 } ) returns the following:

{ "_id" : "2023-12", "sales_quantity" : 41, "sales_amount" : NumberDecimal("506") }
{ "_id" : "2024-01", "sales_quantity" : 102, "sales_amount" : NumberDecimal("1142") }
{ "_id" : "2024-02", "sales_quantity" : 15, "sales_amount" : NumberDecimal("284") }

Use cases

  • Business Intelligence (BI) and Reporting:

For a company needing to generate daily, weekly, and monthly sales reports from a large transactional database, materialized views can precompute and store aggregated data. This approach speeds up report generation and gives users quick access to the latest figures.

  • E-commerce Product Recommendations:

An e-commerce website providing personalized product recommendations can benefit from materialized views. By storing and updating user-product interaction data, the system can quickly generate recommendations without querying the entire user history each time.

  • Geospatial Data Analysis:

A mapping application providing near real-time traffic updates and route recommendations can use materialized views to store geospatial data indexed for quick retrieval. This ensures updates and recommendations are generated quickly, even with vast amounts of dynamic location data.

  • Financial Analysis and Risk Management:

Financial institutions can use materialized views to store and aggregate trading data, enabling rapid risk assessment and financial analysis. This is especially crucial in fast-paced financial markets.

  • Inventory Management and Supply Chain Optimization:

Companies managing large inventories can use materialized views to store inventory data and demand forecasting results. This enables quick decision-making in inventory management and supply chain optimization.

>

>

>

In this blog post, we introduced on-demand materialized views in MongoDB. We started by understanding what on-demand materialized views are and how they differ from regular views. We then delved into the process of creating on-demand materialized views and the method to refresh them as the underlying data changes. Lastly, we introduced some classic use cases for on-demand materialized views. > >

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.