Apache Avro vs Protobuf vs JSON: Serialization Formats for Streaming
Kafka partitioning determines how events are distributed across partitions within a topic. The partition key controls which partition receives each event. Choosing the right key affects ordering, throughput, and even distribution.
How Partitioning Works
Producer → hash(key) % num_partitions → Partition N → Consumer
Events with the same key always go to the same partition → guaranteed ordering for that key.
Partitioning Strategies
| Strategy | Key | Ordering | Distribution | Use Case |
| Entity key | user_id, order_id | Per-entity | Depends on cardinality | Most common |
| Round-robin | null (no key) | None | Even | Logs, metrics |
| Time-based | timestamp bucket | Per-time-window | Even if uniform | Time series |
| Composite | region + user_id | Per-entity-region | More even | Multi-tenant |
Best Practices
- Choose high-cardinality keys —
user_id(millions of values) distributes evenly;country(200 values) may cause hot partitions - Never use a low-cardinality key as the sole partition key
- Match partitions to consumers — 12 partitions → max 12 consumers in a group
- Over-partition initially — easier to have too many than too few
Frequently Asked Questions
How many Kafka partitions should I have?
Start with 12-24 partitions for moderate workloads. Scale based on consumer parallelism and throughput requirements. Each partition is consumed by at most one consumer in a consumer group.
What happens if I change the partition count?
Adding partitions doesn't redistribute existing data. New events may route differently. This can break ordering guarantees for existing keys. Avoid changing partition counts in production when ordering matters.

