For years, the data engineering community debated the future of open table formats. Would Delta Lake’s tight integration with Databricks win out? Could Apache Hudi’s early adoption in the streaming world hold its ground? Or would Apache Iceberg quietly emerge as the dominant player?

Close
Featured Comparison of GitHub stars.

By late 2024, the answer became clear: Apache Iceberg has won.

How did we get here? Databricks acquired Tabular, the company founded by Iceberg’s original creators, signaling a major endorsement of Iceberg’s potential.

Close
Featured Databricks acquired Tabular in June 2024.

Meanwhile, Snowflake rolled out Polaris, its Iceberg-based catalog offering. With prominent query engine vendors like Starburst and Dremio supporting Polaris, the industry aligned around a common standard.

Close
Featured Iceberg ecosystem in 2024.

Apache Iceberg is now the de facto open table format. But the real story is just beginning. Looking ahead to 2025, several exciting developments are set to cement Iceberg’s position as the cornerstone of modern data engineering.


What’s Coming for Iceberg in 2025?


1. RBAC Catalog: Fixing Permissions at Scale


Let’s face it — managing permissions in data lakes has always been messy. Without a unified approach, users have had to rely on ad hoc methods like setting permissions at the S3 bucket level or using query engine-specific access controls. This fragmentation is not only inefficient but also introduces security risks.

Close
Featured With a standardized structure for vended credentials, developers can seamlessly integrate RBAC systems into Iceberg catalogs.

The Iceberg community is solving this problem with a new OpenAPI specification (PR #10722). This spec standardizes the structure of vended credentials, allowing developers to build Role-Based Access Control (RBAC) systems directly into Iceberg catalogs.

For example, an administrator can define fine-grained access policies at the catalog level, independent of the underlying storage or query engine. These capabilities mirror enterprise-grade features like Databricks’ Unity Catalog, but with the flexibility and openness that Iceberg brings.


2. Change Data Capture (CDC): Iceberg’s Streaming Evolution


“Iceberg isn’t for streaming” has been a common refrain in the past. And to be fair, Iceberg lacked robust CDC capabilities. While its architecture supported versioned table snapshots (Spark CDC procedures), it wasn’t optimized for high-frequency data changes or real-time analytics.

That’s changing with Iceberg Spec V3, which introduces a critical feature: Row Lineage.

Close
Featured Row lineage has already been implemented in several mainstream data systems.

Row Lineage enables Iceberg to track changes to individual rows as they are updated, deleted, or inserted. This makes it possible to implement efficient CDC pipelines directly on Iceberg tables, a huge leap forward for streaming use cases. For instance, materialized view maintenance and data synchronization between systems will become more seamless.

Check out the spec proposal for more details. Once Spec V3 is fully implemented, Iceberg will rival traditional streaming-first systems like Kafka and Hudi for real-time data processing.


3. Materialized Views: Simplifying Derived Data


Data lakes are where raw, historical data — often called “bronze” data — lives. These tables are massive and slow-moving, but the real value comes from derived datasets computed from this raw data. Think aggregations, transformations, and precomputed metrics.

Until now, Iceberg lacked built-in support for materialized views, forcing users to rely on external systems or custom solutions to manage derived data. This created two major challenges:

  • Tracking dependencies between base tables and derived tables was cumbersome.
  • Any updates to the base table required a full recomputation of the derived data.

The proposed materialized views feature (PR #11041) changes this. With materialized views, precomputed results are stored as tables, and Iceberg handles the metadata required to track dependencies. This means faster query performance and automated updates to derived data when the base table changes. This also presents a significant opportunity for materialized view-focused systems like RisingWave. Providing materialized views for Iceberg could dramatically improve the user experience when it comes to extracting insights from dynamic data.


Iceberg’s Expansion


As Iceberg evolves, so does its ecosystem. Here are a few areas to watch in 2025:

  • New Data Types: Support for nanosecond-precision timestamps with time zones will open Iceberg to industries like finance and telecommunications, where high-precision data is critical.
  • Binary Deletion Vectors: Spec V3 introduces a scalable, efficient solution for handling deletions, which is especially useful in regulatory environments or for GDPR compliance.

Iceberg’s ecosystem is stronger than ever. Today, you can ingest data into Iceberg using Kafka or the PostgreSQL protocol (via RisingWave) and query that data using modern query engines like Trino, Snowflake, Databricks, RisingWave, and more. Exciting developments are on the horizon for 2025.


What’s Missing in Iceberg?


Iceberg’s ecosystem is already solid. You can use Kafka or Postgres protocols (via RisingWave) to ingest data and query it using a range of engines. But there’s one glaring gap: lightweight compaction.

Close
Featured Today, Iceberg compaction still heavily relies on Apache Spark. Source: https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/best-practices-compaction.html.

Today, compaction typically relies on heavy Spark jobs, which can be overkill for smaller teams or workloads. This creates a barrier for SQL and Python users who want a simpler, more resource-efficient way to compact Iceberg tables.

The good news? The community is aware of this issue, and there’s growing interest in building a lightweight, engine-agnostic compaction framework. Hopefully, 2025 will deliver solutions that make Iceberg more accessible for all users.

The Road Ahead

With innovations like RBAC catalogs, streaming capabilities, materialized views, and support for new data types, Apache Iceberg is on track to become the universal table format for data engineering.

2024 proved that Iceberg could win the format wars. In 2025, the focus shifts to making it better, faster, and easier to use for everyone — from small startups to global enterprises. Whether you’re building real-time analytics pipelines, managing petabytes of historical data, or exploring the cutting edge of data lakehouse architectures, Iceberg has something to offer.

The future of data engineering is here, and it’s Iceberg.

Avatar

Yingjun Wu

Founder and CEO at RisingWave Labs

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.