Join our Streaming Lakehouse Tour!
Register Now.->

Time Window

A Time Window in stream processing is a mechanism for dividing an unbounded data stream into finite segments or "windows" based on time. This allows for computations like aggregations (e.g., sums, averages, counts) or joins to be performed over specific, bounded portions of the stream.

Since streams are continuous, windows are essential for defining the scope of stateful operations that depend on a collection of events rather than individual ones.

Key Concepts

  1. Window Boundaries: Defined by start and end times.
  2. Window Duration: The length of time a window covers.
  3. Time Characteristic: Windows can be based on:
    • Event Time: Uses timestamps embedded in the data records themselves (when the event actually occurred). This is generally preferred for accurate analytics, as it's resilient to processing delays or out-of-order data arrival. It requires mechanisms like watermarks.
    • Processing Time: Uses the clock of the machine processing the data. Simpler to implement but can lead to non-deterministic or inaccurate results if processing speeds vary or data arrives late.
    • Ingestion Time: Uses the time when events are ingested into the streaming system. A compromise between event time and processing time.

Common Types of Time Windows

  1. Tumbling Window (Fixed Window)

    • Description: Divides the stream into fixed-size, non-overlapping, and contiguous time intervals. Each event belongs to exactly one window.
    • Example: A 5-minute tumbling window would create windows like [10:00, 10:05), [10:05, 10:10), [10:10, 10:15), etc.
    • Use Cases: Calculating periodic reports (e.g., total sales every hour), fixed-interval monitoring.
    • RisingWave SQL: TUMBLE(time_col, INTERVAL '5' MINUTE)
  2. Hopping Window (Sliding Window with Fixed Hops)

    • Description: Has a fixed duration (window size) and slides forward by a fixed interval (hop size or slide). Windows can overlap if the hop size is smaller than the window size. If hop size equals window size, it behaves like a tumbling window.
    • Example: A 1-hour window that hops every 10 minutes. The window [09:00, 10:00) is followed by [09:10, 10:10), then [09:20, 10:20), etc.
    • Use Cases: Calculating moving averages, detecting trends over sliding periods, generating frequently updated dashboards.
    • RisingWave SQL: HOP(time_col, INTERVAL '10' MINUTE, INTERVAL '1' HOUR)
  3. Sliding Window (General - often refers to Hopping Window)

    • Description: The term "Sliding Window" is sometimes used more generally to refer to any window that slides over time. Hopping windows are a specific, common type of sliding window. Some systems might offer more generalized sliding windows where new windows are created upon each event's arrival, covering a preceding fixed duration.
    • Use Cases: Similar to hopping windows, for continuous monitoring of recent activity.
  4. Session Window

    • Description: Groups events based on periods of activity, separated by gaps of inactivity (a timeout period). The duration of a session window is not fixed but is determined by the data. A session starts when an event occurs after a defined inactivity gap and ends when no new events arrive for that key within the timeout period.
    • Example: Analyzing user clickstreams where a session ends if a user is inactive for 30 minutes.
    • Use Cases: User behavior analysis, tracking user sessions, identifying periods of continuous device activity.
    • RisingWave SQL: SESSION(time_col, INTERVAL '30' MINUTE) (Conceptual, syntax may vary or be supported via specific UDFs/patterns if not directly built-in for all session window types) - RisingWave primarily uses GROUP BY session_start(), session_end() with conditions for sessionization. More directly, RisingWave uses TUMBLE, HOP, and a LAG based approach for session windows. For a direct SESSION keyword, it would be specific to the system; RisingWave focuses on TUMBLE and HOP for its windowing TVFs. Correction: RisingWave's primary group-and-window functions are TUMBLE and HOP. Session windows are typically constructed using these or other SQL constructs like LAG with conditions.

Windowing in RisingWave

RisingWave provides powerful support for time windowing as part of its Streaming SQL capabilities, primarily through Time Window Valued Functions (TVFs) like TUMBLE() and HOP(). These functions are used in conjunction with GROUP BY clauses to perform aggregations over time windows.

-- Example: Count events in 5-minute tumbling windows
SELECT
    window_start,
    window_end,
    COUNT(*)
FROM TUMBLE(my_stream, time_attr, INTERVAL '5' MINUTE)
GROUP BY window_start, window_end;

-- Example: Sum values in 1-hour hopping windows that slide every 10 minutes
SELECT
    window_start,
    window_end,
    SUM(value_attr)
FROM HOP(my_stream, time_attr, INTERVAL '10' MINUTE, INTERVAL '1' HOUR)
GROUP BY window_start, window_end;

Choosing the right type and configuration of time windows is crucial for deriving meaningful insights from streaming data.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.