Apache Kafka Explained: Architecture, Components, and Use Cases

Apache Flink is a distributed stream processing engine for stateful computations over unbounded (streaming) and bounded (batch) data. Its architecture consists of a JobManager (coordinator) and TaskManagers (workers), with checkpointing for fault tolerance.

Architecture

Client → JobManager (coordinator) → TaskManagers (workers)
              ↓                            ↓
         Job scheduling              Operator execution
         Checkpoint coordination     State management
         Resource management         Data processing

Key Concepts

Concept	Description
JobGraph	Logical DAG of operators compiled from user code
ExecutionGraph	Physical execution plan with parallelism
Checkpoint	Consistent snapshot of all operator state (Chandy-Lamport)
Savepoint	Manual checkpoint for upgrades and migrations
Watermark	Event-time progress tracker for handling late data
State Backend	Where operator state is stored (Heap, RocksDB, ForSt)

Flink 2.0 Key Changes

ForSt State Backend: Disaggregated state on S3 (40x faster recovery)
Materialized Tables: SQL-first table abstraction
ML_PREDICT: Built-in LLM inference function
VECTOR_SEARCH: Real-time vector similarity in SQL

Frequently Asked Questions

Is Flink better than Spark for streaming?

Flink provides true event-at-a-time processing with sub-100ms latency. Spark uses micro-batching with seconds-level latency. For low-latency streaming, Flink is better. For unified batch+streaming in a Spark ecosystem, Spark is simpler.

Is Flink hard to learn?

Yes, Flink has a steep learning curve — especially for Java-based development, state management, and operational concerns. Flink SQL reduces the learning curve but doesn't eliminate operational complexity. For simpler streaming, consider RisingWave.

Apache Flink Explained: Architecture and Key Concepts