High-level Architecture
Before diving into Kafka's internals, let's establish the vocabulary.
Core concepts
| Term | What it is |
|---|---|
| Broker | A Kafka server. Stores data, serves reads and writes. A cluster has multiple brokers. |
| Record | A single message: key + value + timestamp + optional headers |
| Topic | A named stream of records -- like a table in a database. Identified by name, must be unique in the cluster. |
| Producer | An application that publishes records to topics |
| Consumer | An application that reads records from topics |
Topics are not queues
Unlike traditional message queues, Kafka topics retain messages after consumption. Messages stay for a configurable retention period (or until a size limit is hit). Consumers can re-read old messages at any time by resetting their offset. Kafka's performance is constant regardless of data size, so long retention is practical.
Producers and consumers are fully decoupled
Producers write without knowing who (or whether anyone) reads. Consumers read without knowing who produced. They never interact directly -- Kafka sits between them. This decoupling is what makes Kafka so flexible as an integration backbone.
Cluster architecture
A Kafka deployment consists of:
| Component | Role |
|---|---|
| Kafka cluster | One or more brokers working together |
| ZooKeeper | Distributed coordination service that stores cluster metadata (broker list, topic configs, partition leaders, consumer offsets). Highly optimized for reads. |
Newer Kafka versions (3.x+) introduce KRaft mode, which replaces ZooKeeper with an internal Raft-based consensus protocol. This simplifies deployment and removes the external dependency. The concepts in this course apply to both modes -- the coordination logic is the same, just managed differently.