Programmatic Advertising Data Pipelines with RisingWave

Programmatic Advertising Data Pipelines with RisingWave

Programmatic advertising data pipelines with RisingWave connect bid request streams, win notifications, impression events, and conversions into continuously updated materialized views. This replaces batch ETL jobs with a single streaming SQL layer that keeps bidding systems, pacing controls, and reporting dashboards updated within seconds of every auction.

Why Programmatic Pipelines Break Under Batch Architectures

Programmatic advertising moves fast. A demand-side platform (DSP) participates in billions of auctions daily, making bid price decisions in under 100 milliseconds. Yet most DSPs still rely on batch pipelines that summarize auction performance every 15–60 minutes. This creates a dangerous gap: your bidding algorithm is flying blind between batch refreshes.

The consequences are real:

  • Overpacing: You blow through daily budget in the morning because yesterday's batch didn't reveal a publisher volume spike
  • Underpacing: Bids drop too low after a false signal suggests overspend
  • Stale frequency caps: Users see the same ad 20 times because the frequency counter updates too slowly
  • Lost optimization opportunities: A placement that's converting at 3x average CPL stays at the same bid for hours

RisingWave solves this by replacing the batch layer with a streaming SQL engine that keeps all these signals current in real time.

Pipeline Architecture

A complete programmatic pipeline in RisingWave has five stages:

  1. Bid stream ingestion — auction results from your DSP's bid log Kafka topic
  2. Win/loss processing — match wins to bids and compute win rates by publisher and placement
  3. Impression and viewability — track served impressions and viewability signals
  4. Frequency capping state — maintain real-time per-user impression counts
  5. Pacing and budget tracking — monitor spend velocity against daily budget caps

Ingesting Bid and Win Events

CREATE SOURCE bid_responses (
    bid_id          VARCHAR,
    auction_id      VARCHAR,
    campaign_id     VARCHAR,
    placement_id    VARCHAR,
    publisher_id    VARCHAR,
    bid_price       DECIMAL,
    bid_floor       DECIMAL,
    user_id         VARCHAR,
    event_time      TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'dsp-bid-responses',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

CREATE SOURCE win_notifications (
    bid_id          VARCHAR,
    auction_id      VARCHAR,
    clearing_price  DECIMAL,
    event_time      TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'dsp-win-notifications',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

Real-Time Win Rate and CPM by Placement

CREATE MATERIALIZED VIEW placement_win_rates AS
SELECT
    b.placement_id,
    b.publisher_id,
    b.campaign_id,
    window_start,
    window_end,
    COUNT(b.bid_id)                                             AS total_bids,
    COUNT(w.bid_id)                                             AS wins,
    ROUND(COUNT(w.bid_id)::DECIMAL / NULLIF(COUNT(b.bid_id), 0) * 100, 2)
                                                                AS win_rate_pct,
    AVG(b.bid_price)                                            AS avg_bid_price,
    AVG(w.clearing_price)                                       AS avg_clearing_price,
    ROUND(AVG(w.clearing_price) * 1000, 2)                     AS effective_cpm
FROM TUMBLE(bid_responses, event_time, INTERVAL '5 MINUTES') b
LEFT JOIN win_notifications w
    ON b.bid_id = w.bid_id
GROUP BY b.placement_id, b.publisher_id, b.campaign_id, window_start, window_end;

Real-Time Frequency Capping State

Track per-user impression counts in a sliding window to enforce frequency caps:

CREATE SOURCE impression_events (
    impression_id   VARCHAR,
    user_id         VARCHAR,
    campaign_id     VARCHAR,
    ad_id           VARCHAR,
    placement_id    VARCHAR,
    event_time      TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'ad-impressions',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

CREATE MATERIALIZED VIEW user_frequency_caps AS
SELECT
    user_id,
    campaign_id,
    window_start,
    window_end,
    COUNT(impression_id)    AS impressions_in_window
FROM HOP(impression_events, event_time, INTERVAL '1 HOUR', INTERVAL '24 HOURS')
GROUP BY user_id, campaign_id, window_start, window_end;

Query this view from your bidding system before each auction to skip users who have already seen the ad the maximum number of times.

Budget Pacing Monitor

CREATE MATERIALIZED VIEW campaign_spend_pace AS
SELECT
    campaign_id,
    window_start,
    window_end,
    SUM(clearing_price)         AS spend,
    COUNT(*)                    AS impressions_won
FROM TUMBLE(
    (SELECT w.*, b.campaign_id
     FROM win_notifications w
     JOIN bid_responses b ON w.bid_id = b.bid_id),
    event_time,
    INTERVAL '1 HOUR'
)
GROUP BY campaign_id, window_start, window_end;

Comparison: Programmatic Pipeline Architectures

CapabilityTraditional Batch DSPReal-Time Streaming (RisingWave)
Win rate visibility15–60 min lagSub-10 second lag
Frequency cap accuracyApproximate (stale)Exact (current window)
Budget pacing granularityHourlyPer-minute
Placement performance feedbackDelayedImmediate
Infrastructure piecesKafka + Spark/Flink + WarehouseKafka + RisingWave
SQL familiarity requiredHigh (complex Spark SQL)Standard SQL

Sinking to Downstream Systems

Push real-time placement performance to your bid optimization API via Kafka:

CREATE SINK placement_performance_sink
FROM placement_win_rates
WITH (
    connector = 'kafka',
    topic = 'bid-optimization-signals',
    properties.bootstrap.server = 'kafka:9092'
)
FORMAT PLAIN ENCODE JSON;

Archive all auction data to Iceberg for long-term analysis and model training:

CREATE SINK auction_archive_sink
FROM placement_win_rates
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://your-bucket/programmatic-data',
    s3.region = 'us-east-1',
    database.name = 'adtech',
    table.name = 'placement_win_rates'
);

FAQ

Q: How does RisingWave handle out-of-order bid and win event streams? RisingWave supports event-time processing with configurable watermarks. You define a maximum allowed delay on the event_time column, and RisingWave includes late-arriving events in the correct window up to that threshold. Events arriving after the watermark deadline are handled according to your configured late-data policy.

Q: Can RisingWave process billions of bid events per day? Yes. RisingWave is designed for high-throughput streaming workloads. Horizontal scaling across compute nodes, combined with Kafka topic partitioning, allows the pipeline to grow with bid volume. Production deployments process hundreds of millions of events per hour.

Q: How do I connect my existing DSP to RisingWave? If your DSP publishes bid logs and win notifications to Kafka (most modern DSPs do), connect RisingWave using the Kafka source connector. If your DSP writes to a PostgreSQL or MySQL database, use the postgres-cdc or mysql-cdc connector to stream changes into RisingWave.

Q: Does RisingWave support cookie-based user ID joins? Yes. You can store cookie-to-user-ID mapping in a RisingWave table (loaded via CDC from your identity system) and join it to impression events using a standard SQL join on the materialized view.

Get Started

Replace your batch programmatic pipeline with streaming SQL today.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.