Apache Kafka Explained: Architecture, Components, and Use Cases
Apache Flink is a distributed stream processing engine for stateful computations over unbounded (streaming) and bounded (batch) data. Its architecture consists of a JobManager (coordinator) and TaskManagers (workers), with checkpointing for fault tolerance.
Architecture
Client → JobManager (coordinator) → TaskManagers (workers)
↓ ↓
Job scheduling Operator execution
Checkpoint coordination State management
Resource management Data processing
Key Concepts
| Concept | Description |
| JobGraph | Logical DAG of operators compiled from user code |
| ExecutionGraph | Physical execution plan with parallelism |
| Checkpoint | Consistent snapshot of all operator state (Chandy-Lamport) |
| Savepoint | Manual checkpoint for upgrades and migrations |
| Watermark | Event-time progress tracker for handling late data |
| State Backend | Where operator state is stored (Heap, RocksDB, ForSt) |
Flink 2.0 Key Changes
- ForSt State Backend: Disaggregated state on S3 (40x faster recovery)
- Materialized Tables: SQL-first table abstraction
- ML_PREDICT: Built-in LLM inference function
- VECTOR_SEARCH: Real-time vector similarity in SQL
Frequently Asked Questions
Is Flink better than Spark for streaming?
Flink provides true event-at-a-time processing with sub-100ms latency. Spark uses micro-batching with seconds-level latency. For low-latency streaming, Flink is better. For unified batch+streaming in a Spark ecosystem, Spark is simpler.
Is Flink hard to learn?
Yes, Flink has a steep learning curve — especially for Java-based development, state management, and operational concerns. Flink SQL reduces the learning curve but doesn't eliminate operational complexity. For simpler streaming, consider RisingWave.

