Data mesh: What is it all about?


A data mesh is a data architecture and organizational approach designed to handle the challenges of processing and managing large volumes of streaming data in a scalable, efficient, and reliable manner. It was originally introduced by Zhamak Dehghani in 2019 to address the complexities of managing data in modern, decentralized, and scalable systems.

A data mesh approach emphasizes the following key principles:

  1. Domain-Oriented Ownership: Data is owned and managed by domain-specific teams, often referred to as "data product teams," who have a deep understanding of the data's context and business value. These teams are responsible for the quality, reliability, and accessibility of their data products.
  2. Self-Service Platforms: Streaming data platforms and tools are provided as self-service platforms that empower data product teams to manage their data pipelines, data processing, and data delivery. These platforms typically include components for data ingestion, transformation, enrichment, and consumption.
  3. Data as a Product: Streaming data is treated as a product, and data product teams are responsible for defining clear contracts (e.g., schemas, APIs) for how data is produced and consumed. This enables a clear understanding of data dependencies and promotes data discoverability.
  4. Decentralized Architecture: A streaming data mesh architecture avoids centralizing all data processing tasks in a single, monolithic system. Instead, it encourages a distributed and decentralized approach to data processing to achieve scalability, fault tolerance, and performance.


A Data Mesh helps you easily handle data flowing in and out of complex systems involving many organizations. It makes it easier to access data without relying too much on others, so your organization can respond quickly and make smarter business decisions.

Data Mesh is closely related to data governance. Both approaches are managing and ensuring the quality of data within an organization. While Data Mesh focuses on decentralized data organization and domain-oriented ownership, data governance establishes rules and policies for data quality, security, and compliance. These two concepts can complement each other to create a robust data management framework.


Streaming data mesh: What’s new


Streaming Data Mesh and Data Mesh are related concepts, but they focus on different aspects of managing and processing data. Here are the key differences between the two:

  1. Data Type:
    • Data Mesh: Data Mesh is a broader concept that encompasses all types of data, including batch data, streaming data, and even static data. It is a data organizational and architectural approach that aims to address data management challenges across the entire data landscape.
    • Streaming Data Mesh: Streaming Data Mesh is a specific subset of Data Mesh that focuses exclusively on streaming data. It deals specifically with the challenges of handling real-time data streams.
  2. Scope:
    • Data Mesh: Data Mesh addresses the organization, ownership, and governance of data across various domains and types, aiming to break down data silos and improve data collaboration across the organization.
    • Streaming Data Mesh: Streaming Data Mesh narrows its focus to the management and processing of streaming data, emphasizing the need for decentralized, domain-oriented teams to handle real-time data.
  3. Data Processing:
    • Data Mesh: Data Mesh encompasses both batch and streaming data processing, emphasizing the need for domain-specific data teams to manage both types of data effectively.
    • Streaming Data Mesh: Streaming Data Mesh primarily deals with the challenges specific to streaming data processing, such as low latency, real-time analytics, and event-driven architectures.
  4. Latency:
    • Data Mesh: Data Mesh doesn't inherently prioritize low-latency processing since it covers various data types, including batch data, which can tolerate higher latency.
    • Streaming Data Mesh: Streaming Data Mesh places a strong emphasis on low-latency processing, as streaming data typically requires real-time or near-real-time analysis and action.
  5. Tools and Technologies:
    • Data Mesh: Data Mesh principles can be applied using a wide range of data tools and technologies, including data lakes, data warehouses, ETL pipelines, and more.
    • Streaming Data Mesh: Streaming Data Mesh often involves specific streaming data platforms and technologies, such as Apache Kafka, Apache Flink, and other stream processing frameworks.


Streaming data mesh with RisingWave


Although this is not on purpose, we find that surprisingly, RisingWave is the perfect solution for you to get on board with streaming data mesh. We will examine how RisingWave fits the data mesh principles:

  1. Domain-Oriented Ownership. With compatibility with PostgreSQL and its ecosystem, RisingWave naturally inherent powerful data isolation and access control capability. RisingWave treats all data streams as a parallel concept with data tables, which is the basic building block of databases. You can define a new database and manage your data streams, then assign an owner to the data stream directly using your PostgreSQL skill set and experience.
  2. Self-Service Platforms. RisingWave is built to democratize stream processing in the cloud era. The RisingWave Cloud offers a fully managed stream processing platform to free users from managing their infrastructure. Users can build and manage their streaming pipelines easily with a few lines of SQL. Data ingestion, data transformation, data enrichment, and data consumption are all well supported by SQL standards, which is a common practice in data engineering. All data engineers can carry their experience and manage decentralized streaming data directly using SQL in RisingWave Cloud.
  3. Data as a Product. The core design principle behind RisingWave interface designing is to simplify the development and management of data streams. Being abstracted as a database concept, a data stream can be managed smoothly within database administration tools and fit into the existing data mesh toolkits directly. Data ownership and data contracts can be defined easily and clearly.
  4. Decentralized Architecture. RisingWave was built to encompass the modern data stack from day one. As a streaming database, RisingWave is able to connect to more than 10 data sources and support 10+ data sinks. Its nature is to bridge the decentralized data architecture and make use of the data in motion.

Conclusion

By adopting a streaming data mesh approach, organizations can effectively harness the power of real-time data streams while avoiding common pitfalls, such as data silos, centralization bottlenecks, and data ownership issues. It promotes a more agile and collaborative data ecosystem, enabling organizations to respond to changing business needs and data requirements more effectively.

RisingWave is the perfect solution for you to get started with your streaming data mesh journey. Want to know more about RisingWave, visit our website, join our community, and start your free trial on our cloud service.

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.