Apache Kafka Explained: Architecture, Components, and Use Cases

Apache Kafka is a distributed event streaming platform that stores and transports events as an ordered, immutable log. It consists of brokers (servers), topics (categories), partitions (parallel units), producers (writers), and consumers (readers). Kafka handles trillions of events per day at companies like LinkedIn, Uber, and Netflix.

Core Architecture

Producers → Kafka Cluster (Brokers) → Consumers
               ↓
         Topics (logical channels)
               ↓
         Partitions (parallel, ordered logs)
               ↓
         Segments (files on disk)

Key Components

Component	Role
Broker	Server that stores and serves events
Topic	Named channel for a category of events
Partition	Ordered, immutable sequence within a topic
Producer	Application that writes events
Consumer	Application that reads events
Consumer Group	Set of consumers sharing topic consumption
ZooKeeper/KRaft	Cluster coordination (KRaft replaces ZooKeeper in Kafka 4.0+)

Why Kafka Matters

Durability: Events persisted to disk with configurable retention
Ordering: Guaranteed order within each partition
Scalability: Add partitions and brokers for higher throughput
Replay: Consumers can re-read events from any offset
Ecosystem: 120+ connectors (Confluent), integrates with Flink, RisingWave, Spark

Frequently Asked Questions

Is Kafka a database?

No. Kafka is an event streaming platform — it stores and transports events but doesn't support SQL queries, joins, or aggregations. Use a streaming database (RisingWave) or processing engine (Flink) on top of Kafka for those capabilities.

What is replacing ZooKeeper in Kafka?

KRaft (Kafka Raft) replaces ZooKeeper for cluster metadata management. Kafka 4.0+ removes ZooKeeper entirely, simplifying deployment and improving performance.