Marketing Data Mesh Architecture with RisingWave

Marketing Data Mesh Architecture with RisingWave

RisingWave fits naturally into a marketing data mesh as the real-time streaming layer that each domain team uses to publish continuously updated data products. Instead of centralized ETL pipelines, domain teams own their RisingWave materialized views and expose them as low-latency data products — consumed by bidding systems, CRMs, and analytics tools without a central data engineering bottleneck.

What Is a Marketing Data Mesh?

Data mesh is an organizational and technical architecture where domain teams own their data end to end — including ingestion, transformation, and serving. For marketing, this means the paid media team owns ad performance data products, the CRM team owns customer data products, and the attribution team owns conversion data products.

The challenge is freshness. Traditional data mesh implementations route all data through a central warehouse. Queries are fast, but data is hours old. When the paid media team needs real-time campaign signals, the warehouse layer introduces unacceptable latency.

RisingWave solves this by serving as the streaming data product layer. Each domain team manages its own RisingWave sources and materialized views, publishing real-time data products that other teams can query via the standard PostgreSQL interface.

Domain-Owned Data Products

In a marketing data mesh with RisingWave, each domain publishes its own streaming data products. Here is how the paid media domain might own its data:

-- Paid Media Domain: owns ad event ingestion and performance aggregation
CREATE SOURCE paid_media_impressions (
    impression_id  VARCHAR,
    campaign_id    VARCHAR,
    ad_group_id    VARCHAR,
    creative_id    VARCHAR,
    platform       VARCHAR,   -- 'google', 'meta', 'tiktok', 'programmatic'
    cost_usd       DOUBLE PRECISION,
    user_id        VARCHAR,
    event_time     TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'paid-media.impressions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Paid Media Domain: published data product
CREATE MATERIALIZED VIEW paid_media_performance AS
SELECT
    campaign_id,
    ad_group_id,
    platform,
    window_start,
    window_end,
    COUNT(*)           AS impressions,
    SUM(cost_usd)      AS spend_usd,
    COUNT(DISTINCT user_id) AS unique_users_reached
FROM TUMBLE(
    paid_media_impressions,
    event_time,
    INTERVAL '15 minutes'
)
GROUP BY campaign_id, ad_group_id, platform, window_start, window_end;

The CRM domain publishes its own data product independently. Both are queryable by any other domain without requiring centralized ETL coordination.

Cross-Domain Data Product Joins

The power of data mesh emerges when domain data products are joined. Attribution teams combine paid media data products with CRM conversion data using temporal joins — ensuring consistent, time-consistent attribution:

-- Attribution Domain: joins paid media + CRM data products
CREATE SOURCE crm_conversions
WITH (
    connector = 'postgres-cdc',
    hostname = 'crm-postgres',
    port = '5432',
    username = 'rw_reader',
    password = 'secret',
    database.name = 'crm',
    schema.name = 'public',
    table.name = 'conversion_events'
);

CREATE MATERIALIZED VIEW cross_domain_attribution AS
SELECT
    pm.campaign_id,
    pm.platform,
    pm.window_start,
    COUNT(DISTINCT pm.user_id)         AS users_reached,
    SUM(pm.spend_usd)                  AS total_spend,
    COUNT(cv.conversion_id)            AS attributed_conversions,
    SUM(cv.revenue_usd)                AS attributed_revenue,
    SUM(cv.revenue_usd) / NULLIF(SUM(pm.spend_usd), 0) AS roas
FROM paid_media_performance pm
LEFT JOIN crm_conversions cv
    ON pm.user_id = cv.user_id
    AND cv.converted_at BETWEEN pm.window_start AND pm.window_start + INTERVAL '7 days'
GROUP BY pm.campaign_id, pm.platform, pm.window_start;

This cross-domain view is owned by the attribution team. The paid media team and CRM team have no knowledge of it — they just maintain their own data products. Ownership boundaries are clean; data freshness is shared.

Data Product Catalog with Metadata

A data mesh requires a catalog so teams can discover available data products. Use a reference table in RisingWave as a lightweight catalog:

CREATE TABLE data_product_catalog (
    product_id         VARCHAR PRIMARY KEY,
    domain             VARCHAR,
    product_name       VARCHAR,
    description        VARCHAR,
    owning_team        VARCHAR,
    sla_freshness_secs INT,
    update_frequency   VARCHAR
);

INSERT INTO data_product_catalog VALUES
    ('paid_media_perf_15m', 'paid_media', 'paid_media_performance',
     '15-minute aggregate of all paid media impressions and spend by campaign',
     'growth-marketing', 30, '15min tumble window'),
    ('clv_scores', 'crm', 'customer_clv',
     'Per-customer projected 12-month lifetime value, updated on each transaction',
     'crm-team', 5, 'event-driven');

Comparison: Data Mesh Architectures for Marketing

ArchitectureData FreshnessDomain AutonomyQuery ComplexityOperational Overhead
Centralized data warehouseHoursLow (central team bottleneck)Low (single SQL layer)Low
Lake + warehouse (medallion)30–60 minMediumMediumMedium
Kafka + custom consumersSecondsHighHigh (bespoke code)High
RisingWave data meshSecondsHighLow (standard SQL)Low

Publishing Data Products as Kafka Topics

A key principle of data mesh is interoperability. Domain data products should be accessible to systems beyond SQL clients — including real-time ML feature stores and downstream microservices. RisingWave sinks make this straightforward:

CREATE SINK paid_media_performance_topic
FROM paid_media_performance
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'data-products.paid-media-performance'
)
FORMAT UPSERT ENCODE JSON (
    force_append_only = false
);

Any downstream system — a bidder reading feature values, a Slack bot reporting on spend pacing, an ML system computing bid adjustments — subscribes to this topic and receives updates as they happen.

Governance and SLA Monitoring

Data mesh governance requires that each data product meet its SLA. Build a freshness monitor:

CREATE MATERIALIZED VIEW data_product_freshness AS
SELECT
    'paid_media_performance' AS product_name,
    MAX(window_end)          AS last_updated,
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - MAX(window_end))) AS seconds_stale
FROM paid_media_performance;

Alert when seconds_stale exceeds the SLA threshold defined in the catalog. Domain teams own their own freshness SLAs and alerts — consistent with data mesh ownership principles.

FAQ

Q: How does RisingWave handle schema evolution in a data mesh? A: RisingWave supports ALTER TABLE for adding columns to reference tables. For Kafka-sourced data, schema evolution is handled at the serialization layer (Avro/Protobuf with schema registry). New fields propagate to downstream materialized views automatically when the source schema is updated.

Q: Can multiple teams write to the same RisingWave instance, or do they need separate clusters? A: RisingWave supports database-level isolation within a single cluster. Each domain team gets its own schema or database. For stronger isolation, separate clusters with cross-cluster data product access via Kafka topics is the recommended pattern.

Q: How does RisingWave compare to dbt for marketing data products? A: dbt runs batch transformations on a schedule; RisingWave runs continuous transformations on streams. For real-time data products, RisingWave is the right tool. Many teams use both: dbt for historical batch analytics and RisingWave for streaming data products that need freshness under one minute.

Q: Is RisingWave's PostgreSQL interface compatible with standard BI tools? A: Yes. Tableau, Looker, Metabase, Grafana, and any JDBC/ODBC-compatible tool connects directly to RisingWave using the PostgreSQL driver. Materialized views appear as regular tables and update continuously.

Q: Who owns the RisingWave infrastructure in a data mesh model? A: Typically a platform team provisions and operates the RisingWave cluster(s), while domain teams own the SQL definitions — sources, materialized views, and sinks — within their assigned schemas. This mirrors the data mesh principle of separating infrastructure platform from domain data product ownership.

Get Started

Deploy your first streaming data product with the RisingWave quickstart guide.

Connect with architects building marketing data infrastructure in the RisingWave Slack community.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.