What Is Incremental View Maintenance (IVM) and Why It Matters

What Is Incremental View Maintenance (IVM) and Why It Matters

·

12 min read

Introduction

You have a materialized view that summarizes millions of orders by region and status. Every hour, a scheduler kicks off a full refresh. It scans every row, recomputes every aggregation, and writes the result back. The process takes 12 minutes and burns through compute credits, even though only a few thousand rows changed since the last refresh.

This is the core problem incremental view maintenance solves. Instead of recomputing a materialized view from scratch each time source data changes, IVM identifies what changed and updates only the affected results. The difference in cost is dramatic: full refresh scales with the total size of your data, while IVM scales with the size of the change.

In this article, you will learn what incremental view maintenance is, why it matters for real-time applications, and how different database systems implement it. You will also see working SQL examples that demonstrate IVM in practice using RisingWave, a streaming database built for continuous incremental computation.

What is incremental view maintenance?

Incremental view maintenance (IVM) is a technique for keeping materialized views up to date by computing and applying only the changes (deltas) caused by new, updated, or deleted data in the underlying tables, rather than recomputing the entire view from scratch. When a source table receives an insert, update, or delete, the IVM engine calculates how that specific change affects the view's stored result and applies the minimal necessary update.

How IVM differs from full refresh

With a traditional full refresh, the database re-executes the view's defining query against the complete dataset every time you want fresh results. If your view aggregates 500 million rows and 200 new rows arrive, the system still processes all 500 million rows. The cost is O(N) where N is the total data size, regardless of how small the actual change is.

IVM flips this equation. It processes only the delta: the 200 new rows. The cost becomes O(delta), where delta is the size of the change. The DBSP paper formalizes this, showing that an incrementalized query can be faster than the original by a factor of O(|DB|/|delta DB|). For a 500-million-row table with 200 new rows, that is a 2.5-million-fold improvement in theory.

In practice, the speedup depends on query complexity. Simple filters and projections see near-theoretical gains. Joins and aggregations carry state that must be maintained, but the cost is still proportional to the change rather than the full dataset.

Academic roots

IVM research stretches back decades, but three frameworks have shaped modern implementations:

  • Timely Dataflow uses epoch-based timestamps to coordinate distributed incremental computation across multiple workers.
  • Differential Dataflow extends timely dataflow by tracking multiple data versions, enabling selective reuse of prior computation results.
  • DBSP (Database Stream Processing) provides a mathematical model that converts any SQL query into an equivalent incremental circuit, using operators borrowed from digital signal processing.

These frameworks underpin production systems today. Materialize builds on differential dataflow. Feldera implements DBSP directly. RisingWave uses its own streaming engine with a dataflow-based architecture optimized for SQL semantics and exactly-once consistency.

Why does incremental view maintenance matter for real-time applications?

Cost: full refresh does not scale

Full refresh is acceptable when data is small or freshness requirements are loose. But as data volumes grow, the economics break down. Refreshing a materialized view over 1 billion rows every 5 minutes means scanning 1 billion rows 288 times per day, even if only a few million rows change in total. You are paying for 288 billion row scans to process a few million actual changes.

IVM makes real-time materialized view maintenance economically feasible. Processing only deltas means compute cost stays proportional to your data change rate, not your total data size.

Freshness: from minutes to milliseconds

Scheduled refreshes introduce inherent staleness. A 10-minute refresh cycle means your data can be up to 10 minutes old at any given moment. For many applications, this is unacceptable:

  • Fraud detection needs to flag suspicious transactions within seconds, not minutes.
  • Real-time dashboards lose trust when users see stale numbers that contradict what they know happened moments ago.
  • Feature stores for ML serving stale features to a model degrades prediction quality.
  • AI agent context requires agents making decisions on current state, not a 10-minute-old snapshot.

IVM enables sub-second view freshness because updates propagate through the computation graph as soon as source data changes arrive.

Scale: making real-time views practical

Without IVM, organizations face an uncomfortable choice: accept stale data with periodic batch refreshes, or build custom application-level logic to maintain derived state incrementally. The custom approach works for simple counters but becomes brittle and error-prone with joins, windowed aggregations, and multi-way transformations.

IVM handled by the database engine removes this burden. You define the view in SQL, and the engine handles the incremental math, state management, and consistency guarantees. This is especially valuable as the number of materialized views grows. In a typical analytics platform, you might have dozens or hundreds of views, each depending on overlapping source tables. Manual incremental maintenance at that scale is not sustainable.

How does RisingWave implement incremental view maintenance?

RisingWave is a streaming database where every materialized view is incrementally maintained by default. There is no REFRESH command, no scheduler to configure, and no batch jobs to monitor. When data arrives at a source, RisingWave propagates the changes through the view's dataflow graph and updates the stored result automatically.

Streaming architecture

Under the hood, RisingWave compiles each CREATE MATERIALIZED VIEW statement into a distributed dataflow graph. Each SQL operator (filter, join, aggregation, window function) becomes a node in this graph. Source changes flow through the graph as a stream of deltas, and each operator knows how to incrementally update its output given an input delta.

This architecture means incremental materialized views are not a feature bolted onto a batch engine. They are the foundation of how RisingWave processes data.

Consistency guarantees

IVM correctness depends on consistency. If a view joins two tables and both receive concurrent updates, the result must reflect a consistent snapshot, not a mix of old and new data from different tables.

RisingWave provides:

  • Exactly-once processing: every source event is reflected in the view result exactly once, even across failures and restarts.
  • Strong consistency across joins: multi-table materialized views produce results equivalent to running the defining query against a consistent snapshot of all source tables.
  • Barrier-based checkpointing: the system periodically snapshots operator state to durable storage, enabling fast recovery without reprocessing the full history.

SQL example: streaming IVM with Kafka

Here is a complete example that creates a Kafka source, defines tables, and builds an incrementally maintained materialized view with a join and aggregation. All SQL is compatible with RisingWave v2.3+.

Step 1: Create a Kafka source for order events

CREATE SOURCE order_events (
    order_id INT,
    customer_id INT,
    amount DECIMAL,
    status VARCHAR,
    region VARCHAR,
    created_at TIMESTAMP
) WITH (
    connector = 'kafka',
    topic = 'orders',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON;

Step 2: Create a customers table

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR,
    tier VARCHAR
);

Insert some reference data:

INSERT INTO customers VALUES
    (101, 'Acme Corp', 'enterprise'),
    (102, 'Bolt Industries', 'startup'),
    (103, 'Cedar Health', 'enterprise'),
    (104, 'Delta Logistics', 'growth'),
    (105, 'Echo Retail', 'startup');

Step 3: Create an incrementally maintained materialized view

This view joins order events with customer data and computes aggregations by customer tier and region:

CREATE MATERIALIZED VIEW revenue_by_tier_region AS
SELECT
    c.tier,
    o.region,
    COUNT(*) AS order_count,
    SUM(o.amount) AS total_revenue,
    AVG(o.amount) AS avg_order_value
FROM order_events o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY c.tier, o.region;

Step 4: Query the view

SELECT * FROM revenue_by_tier_region ORDER BY total_revenue DESC;

Expected output (after some order events have flowed through):

   tier     | region  | order_count | total_revenue | avg_order_value
------------+---------+-------------+---------------+-----------------
 enterprise | us-east |           3 |       4250.00 |         1416.67
 startup    | us-west |           2 |       1850.00 |          925.00
 enterprise | eu-west |           1 |        980.00 |          980.00
 growth     | us-east |           1 |        320.00 |          320.00

This view updates automatically and incrementally as new order events arrive on the Kafka topic. No refresh command is needed. When a new order for customer 101 lands in Kafka, RisingWave joins it with the customers table, checks if the status is 'completed', and if so, increments the relevant counter and sum. The cost of processing that single order is independent of how many millions of orders already exist.

You can also build cascading views, where one materialized view references another:

CREATE MATERIALIZED VIEW top_revenue_regions AS
SELECT
    region,
    SUM(total_revenue) AS region_revenue
FROM revenue_by_tier_region
GROUP BY region
HAVING SUM(total_revenue) > 1000;

RisingWave incrementally maintains the entire chain. A change in order_events propagates through revenue_by_tier_region and then through top_revenue_regions, all within the same streaming pipeline.

For full syntax details, see the CREATE MATERIALIZED VIEW reference in the RisingWave docs.

How RisingWave compares to other approaches

The difference becomes clear when you look at how other systems handle the same use case:

  • PostgreSQL: You would create the materialized view with CREATE MATERIALIZED VIEW, then set up a cron job or pg_cron to run REFRESH MATERIALIZED VIEW periodically. Between refreshes, the view is stale. The pg_ivm extension adds basic IVM support but is not yet production-ready and blocks writes during maintenance.

  • Snowflake: Materialized views refresh automatically but on a micro-batch schedule controlled by the platform. Latency ranges from seconds to minutes depending on warehouse size and workload. Complex joins and subqueries are not supported in Snowflake materialized views.

  • RisingWave: The view definition itself triggers continuous incremental maintenance. No refresh command, no schedule, no restrictions on joins or aggregations. Freshness is sub-second.

How does IVM compare across different database systems?

The following table compares how four systems handle materialized view maintenance. The goal is to give you an honest picture of the tradeoffs.

DimensionPostgreSQLSnowflakeMaterializeRisingWave
Refresh modelManual (REFRESH MATERIALIZED VIEW) or pg_ivm extensionAutomatic micro-batchContinuous IVM (differential dataflow)Continuous IVM (streaming dataflow)
LatencyMinutes to hours (depends on schedule)Seconds to minutesSub-secondSub-second
ConsistencySnapshot at refresh timeEventual (within refresh cycle)Strict serializableStrong consistency with exactly-once
JOIN support in MVsFull SQL (but full-refresh only)Limited (no joins in MVs)Full SQL including multi-way joinsFull SQL including multi-way joins
SQL compatibilityNative PostgreSQLSnowflake SQLPostgreSQL-compatiblePostgreSQL-compatible
DeploymentSelf-hostedCloud-only (SaaS)Cloud-only (SaaS)Self-hosted or cloud
LicensePostgreSQL License (open source)ProprietarySource-available (BSL)Apache 2.0 (open source)
Best forOLTP with occasional analyticsCloud data warehousingStrict-serializable streaming use casesReal-time analytics, event-driven apps, AI agent context

A few things worth noting:

  • PostgreSQL is the most mature and widely deployed, but its materialized views are batch-only without extensions. The pg_ivm extension is promising but still under active development.
  • Snowflake optimizes for cloud warehouse workloads where minute-level freshness is acceptable. If you need sub-second updates or complex joins in views, it is not the right fit.
  • Materialize pioneered streaming SQL materialized views and offers strong consistency guarantees. It operates as a cloud-only service with source-available licensing.
  • RisingWave provides a similar streaming IVM capability with an open-source (Apache 2.0) license and the option to self-host. It also integrates with Apache Iceberg for lakehouse workflows.

The right choice depends on your freshness requirements, SQL complexity, deployment preferences, and budget. For understanding the full landscape of materialized views, including when batch refresh is perfectly fine, see our detailed guide.

FAQ

What is the difference between IVM and full refresh?

Full refresh recomputes a materialized view's entire result set by re-executing the defining query against all source data. IVM computes only the delta: the minimal change to the view result caused by inserts, updates, or deletes in the source tables. Full refresh cost is O(total data size) per refresh cycle, while IVM cost is O(change size) per update. For large datasets with small, frequent changes, IVM can be orders of magnitude more efficient.

Does IVM work with JOINs and aggregations?

Yes, but with varying levels of support depending on the system. RisingWave and Materialize support IVM over complex queries including multi-way joins, aggregations, window functions, and subqueries. PostgreSQL's pg_ivm extension supports inner joins and basic aggregations but has limitations with outer joins and complex expressions. Snowflake does not support joins in materialized views at all. The key challenge with joins in IVM is maintaining the join state incrementally, which requires the engine to store intermediate results and update them as either side of the join changes.

How does RisingWave handle IVM differently from Materialize?

Both RisingWave and Materialize provide continuous, automatic IVM for streaming SQL workloads. The main differences are architectural and operational. Materialize is built on differential dataflow (from research at Microsoft) and operates as a cloud-only SaaS product under a source-available license (BSL). RisingWave uses its own Rust-based streaming engine with a dataflow architecture, is licensed under Apache 2.0, and can be self-hosted or used as a managed cloud service. RisingWave also provides built-in Apache Iceberg integration for lakehouse workflows. Both support PostgreSQL-compatible SQL, so migrating queries between them is relatively straightforward.

Is incremental view maintenance the same as streaming?

IVM and streaming are related but distinct concepts. IVM is a specific technique for maintaining derived results (materialized views) incrementally as source data changes. Streaming is a broader paradigm for processing data continuously as it arrives. IVM can be implemented in batch systems (like PostgreSQL's pg_ivm, which triggers on each transaction), but it reaches its full potential in streaming systems where the entire computation pipeline is designed for continuous, incremental execution. In a streaming database like RisingWave, IVM is the mechanism that powers materialized views, but the system also handles data ingestion, state management, fault tolerance, and serving, which go beyond IVM alone.

Conclusion

Here are the key takeaways:

  • Incremental view maintenance (IVM) updates materialized views by processing only data changes, not the full dataset. This reduces compute cost from O(data size) to O(change size) per update.
  • Full refresh becomes unsustainable as data grows. If your materialized views take minutes to refresh and your users need seconds of freshness, IVM is the path forward.
  • Modern IVM builds on rigorous foundations, including differential dataflow and DBSP, which provide formal guarantees that incremental results match full recomputation.
  • RisingWave implements IVM natively through its streaming dataflow engine. Every materialized view is incrementally maintained by default, with no refresh commands or schedulers required.
  • The right system depends on your requirements. PostgreSQL works for batch analytics, Snowflake for cloud warehousing, and streaming databases like RisingWave or Materialize for sub-second freshness over complex SQL.

Ready to try incremental view maintenance yourself? Try RisingWave Cloud free, with no credit card required. Sign up here.

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.