Join our Streaming Lakehouse Tour!
Register Now.->

Stream Enrichment

Stream Enrichment is the process of augmenting or enhancing events in a data stream with additional contextual information, typically by looking up and joining data from an external data source or another stream. The goal is to make the raw event data more valuable, understandable, or actionable for subsequent processing, analysis, or storage.

Core Idea

Raw events from sources like sensors, logs, or transaction systems often contain minimal information (e.g., just an ID, a timestamp, and a value). Stream enrichment adds more context, such as:

  • User details: Adding user name, demographics, or preferences to an event with a user_id.
  • Product information: Adding product name, category, or price to an event with a product_id.
  • Location data: Adding city, state, or country based on IP address or coordinates.
  • Sensor metadata: Adding sensor type, location, or calibration data to a raw sensor reading.
  • Threat intelligence: Adding information about known malicious IPs or domains to network traffic events.

Common Enrichment Patterns

  1. Stream-Table Join (or Stream-Dimension Table Join):

    • This is the most common pattern. Each event in the input stream is joined with a relatively static or slowly changing dataset (the "dimension table" or "lookup table") based on a common key.
    • Example: An incoming stream of order_events (containing product_id) is enriched by joining with a products table (containing product_id, product_name, category) to add product details to each order.
    • In RisingWave, this is efficiently handled by joining a stream with a table (which could be a regular table or a materialized view representing another stream's state).
  2. Stream-Stream Join:

    • Enriching one stream with data from another active stream. This is more complex as both datasets are dynamic. It usually involves windowing to constrain the join.
    • Example: Enriching a stream of ad impressions with a stream of ad clicks, joining on campaign_id and user_id within a short time window.
  3. External Lookup / API Call (Lookup Join):

    • For each event, the system makes a synchronous call to an external database, microservice, or API to fetch enrichment data.
    • Pros: Can access the most up-to-date external data.
    • Cons: Can introduce significant latency, become a performance bottleneck, and create external dependencies that affect the stream processor's reliability. This pattern is often less preferred for high-throughput streaming if alternatives like pre-loading data into a joinable table exist.

Benefits of Stream Enrichment

  • Increased Data Value: Enriched data provides more context, leading to better insights and more effective downstream processing.
  • Simplified Downstream Logic: Downstream applications or queries don't need to perform these lookups themselves, simplifying their logic.
  • Improved Analytics and Reporting: Allows for richer and more detailed analysis by including dimensional attributes.
  • Enhanced Real-time Decision Making: Provides the necessary information for more informed real-time actions.

Stream Enrichment in RisingWave

RisingWave excels at stream enrichment, primarily through its powerful stream-table join capabilities:

  • SQL-based Joins: Enrichment logic is expressed using standard SQL JOIN clauses.
  • Efficient State Management: RisingWave maintains the state of the dimension table (if it's based on another stream or is a regular table) efficiently within its state store, allowing for low-latency lookups as stream events arrive.
  • Materialized Views for Dimensions: Dimension data can be ingested into RisingWave as a table or materialized from another source, making it readily available for joins.
  • Temporal Joins: RisingWave supports temporal join semantics, ensuring that stream events are joined with the correct version of the dimension data based on event time.

By performing enrichment directly within the streaming database, RisingWave simplifies the data pipeline and ensures that enriched data is consistently and efficiently produced.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.