Processing Time
Processing Time in stream processing refers to the time at which an event is observed and processed by the stream processing system itself. It is the system's local wall-clock time when an operation (like a window aggregation, a transformation, or a join) occurs on an event.
This contrasts with Event Time, which is the timestamp embedded within the data record itself, indicating when the actual event occurred in the real world.
Key Characteristics
- System-Dependent: Processing Time is determined by the clock of the machine(s) running the stream processing operators.
- Simplicity: It's generally simpler to implement and reason about than Event Time because it doesn't require mechanisms to handle out-of-order events or late data (like watermarks).
- Order of Arrival: Results are based on the order in which events arrive at and are processed by the system, not necessarily the order in which they occurred.
- Non-Deterministic Results (Potentially): If the same stream of events is replayed or processed under different system loads or network conditions, the processing time for each event might vary. This can lead to different results for time-based operations like windowing if processing time is used, especially if processing delays cause events to fall into different windows than they would have under ideal conditions.
How it Works
When a stream processing system uses Processing Time semantics:
- Events arrive at an operator.
- The operator uses its current system time to assign a timestamp to the event for time-based operations or to determine window boundaries.
- For example, a 1-minute tumbling window based on processing time would collect all events that arrive at the window operator within a specific 1-minute interval of the system's clock.
Use Cases
Processing Time is suitable when:
- Real-time system monitoring: When you need to know what the system is observing right now, regardless of when the events actually happened (e.g., current active users on a website based on incoming requests).
- Simplicity is paramount: And the potential non-determinism or slight inaccuracies due to event arrival order are acceptable.
- Data has no inherent event time: Some data sources might not provide reliable event timestamps.
- Strictly ordered feeds: If the input stream is guaranteed to be strictly ordered by event time and arrives with minimal delay, processing time can closely approximate event time.
Processing Time vs. Event Time
Feature | Processing Time | Event Time |
---|
Timestamp Source | System clock of the processor | Timestamp embedded in the event data |
Ordering | Based on arrival at the processor | Based on when the event actually occurred |
Determinism | Potentially non-deterministic | Deterministic (given same watermarking strategy) |
Complexity | Simpler to implement | More complex (requires watermarks, handling late data) |
Accuracy | Can be skewed by system load/network delays | More accurate for reflecting real-world event occurrence |
Late Data | Not explicitly handled (processed when it arrives) | Requires mechanisms (watermarks) to handle |
Processing Time in RisingWave
RisingWave primarily focuses on Event Time processing for accuracy and consistency in analytics, especially for its core features like materialized views over time-windowed aggregations. Functions like TUMBLE, HOP, SESSION typically expect an event time column.
However, RisingWave does provide functions that can access the current system time (processing time) if needed for specific use cases, such as:
- NOW(): Returns the current transaction timestamp (which acts as processing time in many contexts).
- PROCTIME(): Can be used in CREATE TABLE or CREATE SOURCE definitions to declare a column that will be populated with the system time as data is ingested. This proctime column can then be used in queries, though it's crucial to understand the implications for determinism and windowing.
When designing streaming applications, it's vital to choose the appropriate time characteristic (Processing Time or Event Time) based on the requirements for accuracy, determinism, and simplicity. For most analytical use cases where understanding the timing of real-world events is important, Event Time is preferred.
Related Glossary Terms