#
Apache Kafka is a distributed event streaming platform that stores and transports events as an ordered, immutable log. It consists of brokers (servers), topics (categories), partitions (parallel units), producers (writers), and consumers (readers). Kafka handles trillions of events per day at companies like LinkedIn, Uber, and Netflix.
Core Architecture
Producers → Kafka Cluster (Brokers) → Consumers
↓
Topics (logical channels)
↓
Partitions (parallel, ordered logs)
↓
Segments (files on disk)
Key Components
| Component | Role |
| Broker | Server that stores and serves events |
| Topic | Named channel for a category of events |
| Partition | Ordered, immutable sequence within a topic |
| Producer | Application that writes events |
| Consumer | Application that reads events |
| Consumer Group | Set of consumers sharing topic consumption |
| ZooKeeper/KRaft | Cluster coordination (KRaft replaces ZooKeeper in Kafka 4.0+) |
Why Kafka Matters
- Durability: Events persisted to disk with configurable retention
- Ordering: Guaranteed order within each partition
- Scalability: Add partitions and brokers for higher throughput
- Replay: Consumers can re-read events from any offset
- Ecosystem: 120+ connectors (Confluent), integrates with Flink, RisingWave, Spark
Frequently Asked Questions
Is Kafka a database?
No. Kafka is an event streaming platform — it stores and transports events but doesn't support SQL queries, joins, or aggregations. Use a streaming database (RisingWave) or processing engine (Flink) on top of Kafka for those capabilities.
What is replacing ZooKeeper in Kafka?
KRaft (Kafka Raft) replaces ZooKeeper for cluster metadata management. Kafka 4.0+ removes ZooKeeper entirely, simplifying deployment and improving performance.

