A Streaming Join is an operation that combines records from two or more continuously flowing data streams (or a stream and a table) based on common attributes (join keys) and, typically, time-based conditions (join windows). Unlike traditional batch joins that operate on static, bounded datasets, streaming joins must handle dynamic, unbounded inputs where data may arrive out of order and the set of records to join against is constantly evolving.
Streaming joins are fundamental for correlating information from different sources in real-time, enabling complex event processing, enrichment, and contextual analysis.
RisingWave provides robust support for various streaming join types using SQL:
-- Example: Stream-Stream Inner Join with a time window
SELECT
s1.order_id,
s1.product_id,
s2.shipment_status
FROM stream1 AS s1
JOIN stream2 AS s2
ON s1.order_id = s2.order_id
AND s1.event_time BETWEEN s2.event_time - INTERVAL '10' MINUTE AND s2.event_time + INTERVAL '10' MINUTE;
-- Example: Stream-Table (Temporal) Join
CREATE MATERIALIZED VIEW enriched_orders AS
SELECT
o.order_id,
o.order_time,
c.customer_name,
c.customer_region
FROM orders_stream AS o
JOIN customers_table AS c -- customers_table could be a regular table or an MV
ON o.customer_id = c.customer_id;
-- For true temporal behavior, if customers_table is an MV tracking history,
-- more specific temporal join syntax or versioned lookups might be involved
-- depending on the exact semantics required and system capabilities.
-- RisingWave handles this implicitly when joining a stream with an MV.
RisingWave's incremental computation engine efficiently maintains the state for these joins and updates the results as new data arrives or existing data is retracted.