Skip to main content

Role of ZooKeeper

Kafka brokers are stateless -- they rely on ZooKeeper (a distributed coordination service, similar to Chubby) for all cluster metadata and coordination.

Think first
Kafka brokers are described as "stateless." But someone needs to know which brokers are alive, which broker leads each partition, and where consumer offsets are. If the brokers themselves do not store this coordination state, what kind of external system would you need, and what properties should it have?

What ZooKeeper stores for Kafka

MetadataPurpose
Broker registryWhich brokers are alive and their addresses
Topic configurationTopics, their partition counts, replication factors
Partition leadershipWhich broker is the leader for each partition
Consumer offsetsLast committed offset per consumer group per partition (legacy; modern clients use an internal Kafka topic)
ACLsAccess control lists for topic authorization

How producers find the partition leader

In modern Kafka, clients no longer talk directly to ZooKeeper. Instead:

  1. Producer connects to any broker and asks: "Who is the leader for Partition 1?"
  2. The broker (which gets this info from ZooKeeper) responds with the leader broker's address
  3. Producer connects to the leader broker directly and publishes the message
Think first
If ZooKeeper goes down temporarily, should Kafka stop accepting reads and writes entirely? Or could it continue operating with stale metadata? What are the risks of each approach?

Fault tolerance

ZooKeeper replicates its data across its own cluster, so a Kafka broker failure (or ZooKeeper node failure) doesn't lose any cluster state. If ZooKeeper temporarily goes down, Kafka continues operating with its last-known state. When ZooKeeper recovers, the full state is restored.

ZooKeeper is also responsible for triggering partition leader election when a broker fails -- it notifies the controller broker, which then assigns new leaders for the failed broker's partitions.

KRaft: ZooKeeper's replacement

Apache Kafka is moving away from ZooKeeper with KRaft mode (Kafka Raft). In KRaft, a subset of Kafka brokers acts as the coordination layer using the Raft consensus protocol, eliminating the external ZooKeeper dependency. The metadata and coordination concepts remain the same -- only the implementation changes.

Quiz
What would happen if ZooKeeper permanently lost its data (broker registry, partition leaders, topic configs) and could not recover?