High-Frequency Trading Data Pipeline with RisingWave

High-Frequency Trading Data Pipeline with RisingWave

A high-frequency trading data pipeline with RisingWave uses Kafka for market data ingestion, materialized views for continuous signal computation, and the PostgreSQL-compatible query interface for low-latency reads—giving HFT firms a streaming SQL layer that computes VWAP, order book imbalance, and momentum signals in real time without custom application code.

HFT Data Infrastructure Challenges

High-frequency trading firms face a data engineering problem that is extreme in every dimension: millions of market data events per second, microsecond-sensitive latency requirements, and derived signal computation that must keep pace with the feed. Traditional approaches use custom C++ pipelines or in-memory databases with hand-coded aggregations.

RisingWave occupies a different tier: it is not a nanosecond-latency execution system, but it is an excellent streaming SQL layer for the analytical and signal-generation workloads that sit above raw execution—computing VWAP over sliding windows, detecting order book imbalance, tracking cross-instrument correlations, and feeding risk checks. These are the workloads that traditionally require weeks of custom code.

Market Data Ingestion

-- Level 2 order book updates from exchange feeds
CREATE SOURCE order_book_updates (
    exchange_id         VARCHAR,
    symbol              VARCHAR,
    side                VARCHAR,        -- 'BID' or 'ASK'
    price               DECIMAL(18,8),
    quantity            DECIMAL(18,8),
    update_type         VARCHAR,        -- 'ADD', 'UPDATE', 'DELETE'
    sequence_num        BIGINT,
    event_time          TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'order-book-l2',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

-- Trade tape: executed trades
CREATE SOURCE trade_tape (
    exchange_id         VARCHAR,
    symbol              VARCHAR,
    trade_id            VARCHAR,
    price               DECIMAL(18,8),
    quantity            DECIMAL(18,8),
    aggressor_side      VARCHAR,        -- 'BUY' or 'SELL'
    trade_time          TIMESTAMPTZ
) WITH (
    connector = 'kafka',
    topic = 'trade-tape',
    properties.bootstrap.server = 'kafka:9092',
    scan.startup.mode = 'latest'
) FORMAT PLAIN ENCODE JSON;

VWAP Computation with Tumbling Windows

Volume-weighted average price is one of the most fundamental HFT signals. With RisingWave's TUMBLE() function, you maintain VWAP across multiple time horizons simultaneously:

-- 1-minute VWAP per symbol per exchange
CREATE MATERIALIZED VIEW vwap_1min AS
SELECT
    window_start,
    window_end,
    symbol,
    exchange_id,
    SUM(price * quantity) / NULLIF(SUM(quantity), 0)    AS vwap,
    SUM(quantity)                                        AS total_volume,
    COUNT(*)                                             AS trade_count,
    MIN(price)                                          AS low,
    MAX(price)                                          AS high,
    MIN(price) FILTER (WHERE trade_time = window_start) AS open_price
FROM TUMBLE(trade_tape, trade_time, INTERVAL '1 MINUTE')
GROUP BY window_start, window_end, symbol, exchange_id;

-- 5-minute sliding window for momentum signal
CREATE MATERIALIZED VIEW vwap_5min_hop AS
SELECT
    window_start,
    window_end,
    symbol,
    SUM(price * quantity) / NULLIF(SUM(quantity), 0)    AS vwap_5m,
    SUM(quantity)                                        AS volume_5m,
    COUNT(DISTINCT exchange_id)                          AS exchanges_active
FROM HOP(trade_tape, trade_time, INTERVAL '1 MINUTE', INTERVAL '5 MINUTES')
GROUP BY window_start, window_end, symbol;

Order Book Imbalance Signal

Order book imbalance—the ratio of bid quantity to ask quantity at the top of the book—is a classic short-term directional signal. Computing it continuously over a streaming order book requires aggregating the current state of bids and asks:

-- Current best bid/ask aggregation (top-of-book state)
CREATE MATERIALIZED VIEW top_of_book AS
SELECT
    symbol,
    exchange_id,
    MAX(price) FILTER (WHERE side = 'BID' AND update_type != 'DELETE') AS best_bid,
    MIN(price) FILTER (WHERE side = 'ASK' AND update_type != 'DELETE') AS best_ask,
    SUM(quantity) FILTER (WHERE side = 'BID' AND update_type != 'DELETE'
        AND price >= MAX(price) FILTER (WHERE side = 'BID' AND update_type != 'DELETE') * 0.999)
                                                                        AS bid_depth_near,
    SUM(quantity) FILTER (WHERE side = 'ASK' AND update_type != 'DELETE'
        AND price <= MIN(price) FILTER (WHERE side = 'ASK' AND update_type != 'DELETE') * 1.001)
                                                                        AS ask_depth_near,
    MAX(event_time)                                                     AS book_time
FROM order_book_updates
GROUP BY symbol, exchange_id;

-- Compute bid-ask imbalance ratio
CREATE MATERIALIZED VIEW book_imbalance AS
SELECT
    symbol,
    exchange_id,
    best_bid,
    best_ask,
    best_ask - best_bid                                     AS spread,
    bid_depth_near,
    ask_depth_near,
    CASE
        WHEN bid_depth_near + ask_depth_near = 0 THEN 0
        ELSE (bid_depth_near - ask_depth_near)::DECIMAL /
             (bid_depth_near + ask_depth_near)
    END                                                     AS imbalance_ratio,
    book_time
FROM top_of_book;

Sinking Signals to Execution Systems

-- Sink computed signals to Kafka for consumption by execution algorithms
CREATE SINK trading_signals_feed AS
SELECT
    v.symbol,
    v.window_end         AS signal_time,
    v.vwap               AS vwap_1m,
    v5.vwap_5m,
    b.imbalance_ratio,
    b.spread,
    v.total_volume
FROM vwap_1min v
JOIN vwap_5min_hop v5
    ON v.symbol = v5.symbol
    AND v.window_end BETWEEN v5.window_start AND v5.window_end
JOIN book_imbalance b
    ON v.symbol = b.symbol
    AND v.exchange_id = b.exchange_id
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'trading-signals'
) FORMAT PLAIN ENCODE JSON;

Comparison: Custom Code vs. Streaming SQL for HFT Analytics

DimensionCustom C++/Java PipelineStreaming SQL with RisingWave
Development timeWeeks to monthsDays
Signal iteration speedSlow (recompile, redeploy)Fast (ALTER MATERIALIZED VIEW)
Latency tierMicroseconds (execution)Milliseconds (analytics/signals)
Operational complexityHigh (custom monitoring)Low (SQL + Kafka ecosystem)
Multi-signal correlationComplex custom joinsStandard SQL JOINs
Risk check integrationCustom integration codePostgreSQL wire protocol
Use case fitOrder routing, executionSignal generation, risk, analytics

FAQ

Q: Is RisingWave fast enough for HFT execution? A: RisingWave is not designed for nanosecond-latency order execution—that remains the domain of FPGA-based or custom kernel-bypass systems. RisingWave excels at the analytical layer: computing signals, running risk checks, building real-time surveillance, and feeding decision support systems. Think of it as the real-time analytics brain that informs execution, not the execution engine itself.

Q: How many market data events per second can RisingWave process? A: Throughput depends on cluster size, complexity of materialized views, and hardware. Benchmark results show single-node RisingWave instances processing hundreds of thousands of events per second. For US equity markets (~1M quotes/second at peak), a properly sized cluster with Kafka partitioning handles the load. Colocate RisingWave workers with Kafka brokers for best throughput.

Q: How do we handle out-of-order market data events? A: RisingWave's watermark mechanism handles late-arriving events within configurable bounds. For market data, a watermark of a few hundred milliseconds is sufficient to absorb normal network jitter. Sequence number gaps (missing exchange sequence numbers) should be handled upstream in the feed handler before events reach Kafka.

Q: Can RisingWave join trade tape data with options pricing models? A: Yes. Store options pricing parameters (implied volatility, Greeks) in CREATE TABLE reference tables, updated via CDC or periodic upserts. Join trade tape events against these tables using temporal joins (FOR SYSTEM_TIME AS OF) to get point-in-time accurate pricing data.

Q: How do we monitor signal freshness in production? A: Query the MAX(signal_time) from your signal materialized views against current time. Alert when the lag exceeds your threshold. RisingWave also exposes internal metrics via a Prometheus endpoint for monitoring processing lag on each source.


Get Started

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.