Summary: Kafka
The big picture
Kafka's genius is its simplicity. At its core, it's an append-only, distributed commit log -- and by constraining its data structure to sequential appends and reads, it achieves throughput that seems impossible for a disk-based system. Millions of messages per second, stored durably, replayable by any number of consumers at any time.
Kafka didn't just replace traditional message queues -- it created a new category: the event streaming platform. Instead of "deliver this message and forget it," Kafka says "store this event permanently and let anyone who cares read it at their own pace." This single shift enables architectures (event sourcing, CQRS, change data capture) that are impossible with traditional messaging.
Key concepts at a glance
| Concept | What it is | Why it matters |
|---|---|---|
| Topic | A named stream of messages | Logical organization -- producers write to topics, consumers read from them |
| Partition | A subset of a topic, stored on one broker | The unit of parallelism -- more partitions = more throughput |
| Broker | A Kafka server | Stores partitions on disk, serves reads and writes |
| Replica | A copy of a partition on another broker | Fault tolerance -- if a broker dies, replicas take over |
| ISR (In-Sync Replicas) | Replicas that are caught up with the leader | Only ISR members can become the new leader |
| Consumer group | A set of consumers that share the work of reading a topic | Each partition is consumed by exactly one member of the group |
| Offset | A message's position in a partition | Consumers track offsets to know where they are in the stream |
How Kafka uses system design patterns
| Problem | Pattern | How Kafka uses it |
|---|---|---|
| Durable message storage | Write-ahead Log | Kafka is a distributed write-ahead log -- every message is durably appended |
| Managing log size | Segmented Log | Partitions are split into segments for efficient purging and lookup |
| Tracking replication progress | High-Water Mark | Consumers only see messages up to the high-water mark (committed to all ISRs) |
| Partition leadership | Leader and Follower | Each partition has one leader that handles all reads/writes |
| Preventing zombie controllers | Split-brain (Epoch number) | Controller epoch prevents stale controllers from issuing commands |
| Verifying data integrity | Checksum | CRC32 in each message record, verified by brokers and consumers |
Kafka's delivery semantics
| Guarantee | How it works | When to use |
|---|---|---|
| At-most-once | Consumer commits offset before processing. If it crashes, the message is skipped. | Low-value messages where losing some is acceptable (metrics, logs) |
| At-least-once | Consumer commits offset after processing. If it crashes, the message is reprocessed. | Most use cases -- idempotent consumers handle duplicates |
| Exactly-once | Idempotent producers + transactional writes. Kafka guarantees each message is processed exactly once. | Financial transactions, inventory counts -- anything where duplicates or losses are unacceptable |
When asked "How does Kafka guarantee exactly-once delivery?", the answer involves three mechanisms: idempotent producers (each message gets a sequence number, broker deduplicates), transactions (atomic writes across multiple partitions), and consumer offset management (offsets committed atomically with processing results).
Quick reference card
| Property | Value |
|---|---|
| Type | Distributed streaming platform / commit log |
| CAP classification | CP (within each partition) |
| Consistency | Strong consistency per-partition via ISR |
| Data model | Topics → Partitions → ordered log of messages |
| Partitioning | Topic-level, producer-configured (key-based or round-robin) |
| Replication | Leader + in-sync replicas (ISR) |
| Ordering guarantee | Per-partition only (not across partitions) |
| Storage | Segmented log files on disk, retention-based cleanup |
| Coordination | ZooKeeper (or KRaft in newer versions) |
| Open source | Yes (Apache) |