A Dataflow Graph (or Streaming Job Graph) is a directed acyclic graph (DAG) commonly used by distributed stream processing systems to represent the logical structure and execution plan of a streaming computation or Continuous Query. It defines how data flows between different processing steps (operators) from sources to sinks.
Nodes in the graph represent operators performing specific tasks (e.g., reading from a source, filtering, aggregating, joining, writing to a sink), and edges represent the data streams flowing between these operators.
Dataflow graphs serve several crucial purposes:
RisingWave internally uses the concept of a dataflow graph (often referred to as the 'streaming plan' or 'fragment graph') to execute Streaming SQL queries, particularly those defined within a 'CREATE MATERIALIZED VIEW' or 'CREATE SINK' statement:
Users typically don't interact directly with the low-level dataflow graph in RisingWave; they define the desired computation using SQL, and RisingWave handles the translation to an efficient distributed execution graph. However, understanding the concept helps in analyzing query performance (e.g., using 'EXPLAIN' commands which might show aspects of the plan) and comprehending how distributed stream processing works internally.