High-level Architecture

Before diving into Kafka's internals, let's establish the vocabulary.

Core concepts

Term	What it is
Broker	A Kafka server. Stores data, serves reads and writes. A cluster has multiple brokers.
Record	A single message: key + value + timestamp + optional headers
Topic	A named stream of records -- like a table in a database. Identified by name, must be unique in the cluster.
Producer	An application that publishes records to topics
Consumer	An application that reads records from topics

Think first

Traditional message queues delete messages after they are consumed. What advantages and disadvantages would there be if a messaging system kept messages around even after consumption? Think about debugging, replaying events, and storage costs.

Topics are not queues

Unlike traditional message queues, Kafka topics retain messages after consumption. Messages stay for a configurable retention period (or until a size limit is hit). Consumers can re-read old messages at any time by resetting their offset. Kafka's performance is constant regardless of data size, so long retention is practical.

Producers and consumers are fully decoupled

Producers write without knowing who (or whether anyone) reads. Consumers read without knowing who produced. They never interact directly -- Kafka sits between them. This decoupling is what makes Kafka so flexible as an integration backbone.

Think first

If producers and consumers are fully decoupled and never interact directly, how does a consumer know where to find its messages? What component must sit in the middle, and what responsibilities does it need?

Cluster architecture

A Kafka deployment consists of:

Component	Role
Kafka cluster	One or more brokers working together
ZooKeeper	Distributed coordination service that stores cluster metadata (broker list, topic configs, partition leaders, consumer offsets). Highly optimized for reads.

ZooKeeper is being replaced

Newer Kafka versions (3.x+) introduce KRaft mode, which replaces ZooKeeper with an internal Raft-based consensus protocol. This simplifies deployment and removes the external dependency. The concepts in this course apply to both modes -- the coordination logic is the same, just managed differently.

Quiz

What would happen if Kafka deleted messages immediately after a consumer reads them, like a traditional message queue?

Nothing would change -- consumers rarely need to re-read messages.

New consumers joining later could never process historical data, and replaying events after a bug fix would be impossible.

It would improve performance because the broker stores less data.

Producers would need to resend messages whenever a new consumer subscribes.

Core concepts​

Topics are not queues​

Producers and consumers are fully decoupled​

Cluster architecture​

Core concepts

Topics are not queues

Producers and consumers are fully decoupled

Cluster architecture