Join our Streaming Lakehouse Tour!
Register Now.->

Stream-Table Join

A Stream-Table Join (often referred to as a Temporal Join or Dimension Join in streaming contexts) is a type of join operation in stream processing that combines events from a continuously flowing data stream with data from a table. The table typically represents more static or slowly changing "dimension" data, while the stream provides the "fact" or event data.

This join is primarily used for stream enrichment, where incoming stream events are augmented with contextual information from the table.

Core Idea

  • Stream as the "Driving" Input: Events arrive on the stream one by one.
  • Table as the "Lookup" or "Dimension" Input: For each event from the stream, the system looks up corresponding data in the table based on a common join key.
  • Enrichment: The attributes fetched from the table are added to the stream event to create an enriched output event.

Key Characteristics

  • Asymmetric Join: The nature of the join is asymmetric; the stream drives the lookups into the table.
  • State Management for the Table: The stream processing system needs to have efficient access to the table's data.
    • If the table is small, it might be broadcasted and cached entirely in the memory of processing nodes.
    • If the table is large, it needs to be queryable efficiently, often requiring it to be managed by the stream processor's state store or an external, fast-access database.
    • In RisingWave, tables used in stream-table joins are typically managed within its state store, allowing for low-latency lookups.
  • Handling Table Updates: A crucial aspect is how changes to the table data are reflected in the join results.
    • Static Table: If the table is truly static, lookups are straightforward.
    • Slowly Changing Table: If the table updates, the join behavior depends on the system's capabilities.
      • Point-in-Time Correctness (Temporal Join): Ideally, a stream event should be joined with the version of the table row that was valid at the event's timestamp (event time). This requires versioned table state and more complex join logic. RisingWave aims to provide this.
      • Processing Time Join: Joins the stream event with the current version of the table row at the moment the stream event is processed. This is simpler but might lead to inconsistencies if table updates and stream event processing are not perfectly synchronized.
  • Join Semantics (Inner, Outer):
    • INNER JOIN: An output event is produced only if a matching row is found in the table for the stream event's join key.
    • LEFT OUTER JOIN (Common): An output event is always produced for every stream event. If a match is found in the table, the table's attributes are added. If no match is found, NULL values are typically used for the table's attributes in the output. This is very common for enrichment, as you usually want to process every stream event, even if enrichment data is missing.

Use Cases

  • Enriching order events with product details (name, category, price) from a products table.
  • Adding user demographics (age, location) from a users table to clickstream events.
  • Appending sensor metadata (location, type, calibration factor) from a sensors table to raw sensor readings.
  • Decorating financial transactions with customer account information.

Stream-Table Joins in RisingWave

RisingWave provides robust and efficient support for stream-table joins, which are fundamental to its stream enrichment capabilities:

  • SQL Interface: Stream-table joins are expressed using standard SQL JOIN syntax.
  • Table Representation: The "table" side can be:
    • A regular table created in RisingWave (perhaps populated from an external database via CDC or batch load).
    • A materialized view that represents the state of another stream or a transformation.
  • Efficient Lookups: RisingWave manages the table data in its Hummock state store, optimized for fast key-based lookups required by the join.
  • Incremental Updates: When the stream produces new events, or when the underlying table data changes (if the table itself is a materialized view or sourced via CDC), RisingWave incrementally updates the results of the stream-table join (typically materialized into another view).
  • Temporal Correctness (Aim): RisingWave is designed to handle temporal aspects, aiming to join stream events with the appropriate version of table data based on event time, which is crucial for accuracy with changing dimension data.

Example (Conceptual SQL):

CREATE MATERIALIZED VIEW enriched_orders AS
SELECT
    o.order_id,
    o.order_timestamp,
    p.product_name,
    p.category,
    o.quantity * p.price AS total_amount
FROM
    order_stream AS o
LEFT JOIN
    products_table AS p
ON
    o.product_id = p.product_id;

In this example, each event from order_stream is enriched with product_name, category, and price from products_table. If a product ID in an order doesn't exist in products_table, the order still flows through, but product_name, category, and total_amount (if price is NULL) would be NULL.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.