Real-Time Market Data Feed Processing with RisingWave

Real-Time Market Data Feed Processing with RisingWave

Real-time market data feed processing with RisingWave means ingesting price ticks, order book updates, and trade prints from multiple venues simultaneously, then normalizing and aggregating them into unified, continuously updated views — all using standard SQL without custom streaming code.

The Market Data Normalization Challenge

Financial market data is notoriously fragmented. A single equity may trade on NYSE, NASDAQ, BATS, IEX, and a dozen dark pools simultaneously, each publishing quote and trade data in slightly different formats and at different latencies. Consolidated data products like the National Best Bid and Offer (NBBO) require aggregating these feeds in real time.

Beyond consolidation, market participants need derived data: Volume-Weighted Average Price (VWAP), rolling volatility, order book imbalance, and time-bucketed OHLCV (open-high-low-close-volume) bars. Traditionally, this required proprietary tick data platforms (Bloomberg, FactSet, Refinitiv) or custom low-latency C++ systems.

RisingWave offers a third path: standard streaming SQL over Kafka-connected market data feeds, producing always-current derived views accessible via PostgreSQL.

Ingesting Multi-Venue Price Data

-- Consolidated price tick feed from market data aggregator
CREATE SOURCE price_ticks (
    venue          VARCHAR,  -- 'NYSE', 'NASDAQ', 'BATS', 'IEX'
    symbol         VARCHAR,
    bid            NUMERIC,
    ask            NUMERIC,
    last_price     NUMERIC,
    last_size      INTEGER,
    cumulative_vol BIGINT,
    tick_time      TIMESTAMPTZ
)
WITH (
    connector = 'kafka',
    topic = 'market.ticks',
    properties.bootstrap.server = 'broker:9092',
    scan.startup.mode = 'latest'
)
FORMAT PLAIN ENCODE JSON;

-- Instrument reference data
CREATE TABLE instrument_reference (
    symbol         VARCHAR PRIMARY KEY,
    isin           VARCHAR,
    asset_class    VARCHAR,
    exchange       VARCHAR,
    lot_size       INTEGER,
    tick_size      NUMERIC,
    currency       VARCHAR
);

OHLCV Bar Generation with Tumbling Windows

Generate 1-minute OHLCV bars — the foundation for most charting and quant research systems:

CREATE MATERIALIZED VIEW ohlcv_1min AS
SELECT
    symbol,
    venue,
    FIRST_VALUE(last_price) OVER (
        PARTITION BY symbol, venue, window_start
        ORDER BY tick_time
    )                       AS open_price,
    MAX(last_price)         AS high_price,
    MIN(last_price)         AS low_price,
    LAST_VALUE(last_price) OVER (
        PARTITION BY symbol, venue, window_start
        ORDER BY tick_time
    )                       AS close_price,
    SUM(last_size)          AS volume,
    COUNT(*)                AS tick_count,
    AVG(ask - bid)          AS avg_spread,
    window_start,
    window_end
FROM TUMBLE(
    price_ticks,
    tick_time,
    INTERVAL '1 MINUTE'
)
GROUP BY
    symbol,
    venue,
    window_start,
    window_end;

VWAP and Rolling Volatility

Compute VWAP and rolling price metrics using hopping windows for continuous output:

CREATE MATERIALIZED VIEW vwap_and_volatility AS
SELECT
    symbol,
    SUM(last_price * last_size) / NULLIF(SUM(last_size), 0)  AS vwap,
    SUM(last_size)                                            AS total_volume,
    COUNT(*)                                                  AS tick_count,
    MAX(last_price)                                           AS period_high,
    MIN(last_price)                                           AS period_low,
    MAX(last_price) - MIN(last_price)                         AS price_range,
    STDDEV(last_price)                                        AS price_stddev,
    AVG(ask - bid)                                            AS avg_bid_ask_spread,
    window_start,
    window_end
FROM HOP(
    price_ticks,
    tick_time,
    INTERVAL '1 MINUTE',
    INTERVAL '15 MINUTES'
)
GROUP BY
    symbol,
    window_start,
    window_end;

This produces a 15-minute rolling VWAP and volatility estimate, updated every minute — exactly what algorithmic execution systems need to benchmark against.

National Best Bid and Offer (NBBO) Consolidation

Consolidate quotes across venues into a best bid/offer view:

CREATE MATERIALIZED VIEW nbbo AS
SELECT
    symbol,
    MAX(bid)                           AS national_best_bid,
    MIN(ask)                           AS national_best_offer,
    MIN(ask) - MAX(bid)                AS nbbo_spread,
    COUNT(DISTINCT venue)              AS contributing_venues,
    MAX(tick_time)                     AS last_update
FROM price_ticks
GROUP BY symbol;

Distributing Derived Data

Publish processed market data to downstream consumers:

CREATE SINK market_data_analytics_sink
FROM vwap_and_volatility
WITH (
    connector = 'kafka',
    topic = 'market.derived.vwap',
    properties.bootstrap.server = 'broker:9092'
)
FORMAT PLAIN ENCODE JSON;

Proprietary Platform vs. Streaming SQL

DimensionProprietary Tick PlatformStreaming SQL (RisingWave)
Setup timeWeeks to monthsHours
CostVery high (licensing)Low (open source core)
Custom derived metricsRequires vendor extensionPure SQL
Multi-venue consolidationBuilt-in (vendor-specific)SQL aggregation
BI tool integrationLimitedPostgreSQL-compatible
LatencyUltra-low (microseconds)Low (milliseconds)
Best fitHFT, co-locationAnalytics, risk, research

FAQ

Q: Can RisingWave handle ultra-low latency requirements for high-frequency trading? A: RisingWave operates in the millisecond range, which suits analytics, risk, and research workflows. For microsecond-latency HFT execution, specialized FPGA or kernel-bypass systems are more appropriate. RisingWave excels at the analytics and risk layer above the execution layer.

Q: How do we handle the sheer volume of tick data from major equity markets? A: RisingWave can ingest millions of events per second. U.S. equity markets generate on the order of 10 billion ticks per day during busy sessions. RisingWave can process this at full speed with appropriate cluster sizing and Kafka partition configuration.

Q: Can we store full tick history in RisingWave? A: RisingWave is optimized for streaming computation, not long-term tick storage. For historical tick archives, sink data to Apache Iceberg via the Iceberg connector and query historical data with an analytical engine like Spark or Trino.

Q: How do we handle clock skew between venues? A: Each price tick carries a tick_time timestamp from the originating venue. RisingWave's watermark configuration can be tuned to accommodate the expected clock skew between venues when performing cross-venue aggregations.

Q: Is RisingWave suitable for options data with complex multi-leg instruments? A: Options quote data can be ingested the same way as equity ticks. Multi-leg instrument analytics (spreads, straddles, complex greeks) are typically computed in a pricing library; the resulting risk sensitivities can then be ingested and aggregated in RisingWave.

Get Started

Build your real-time market data platform with streaming SQL:

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.