Role of ZooKeeper

Kafka brokers are stateless -- they rely on ZooKeeper (a distributed coordination service, similar to Chubby) for all cluster metadata and coordination.

Think first

Kafka brokers are described as "stateless." But someone needs to know which brokers are alive, which broker leads each partition, and where consumer offsets are. If the brokers themselves do not store this coordination state, what kind of external system would you need, and what properties should it have?

What ZooKeeper stores for Kafka

Metadata	Purpose
Broker registry	Which brokers are alive and their addresses
Topic configuration	Topics, their partition counts, replication factors
Partition leadership	Which broker is the leader for each partition
Consumer offsets	Last committed offset per consumer group per partition (legacy; modern clients use an internal Kafka topic)
ACLs	Access control lists for topic authorization

How producers find the partition leader

In modern Kafka, clients no longer talk directly to ZooKeeper. Instead:

Producer connects to any broker and asks: "Who is the leader for Partition 1?"
The broker (which gets this info from ZooKeeper) responds with the leader broker's address
Producer connects to the leader broker directly and publishes the message

Think first

If ZooKeeper goes down temporarily, should Kafka stop accepting reads and writes entirely? Or could it continue operating with stale metadata? What are the risks of each approach?

Fault tolerance

ZooKeeper replicates its data across its own cluster, so a Kafka broker failure (or ZooKeeper node failure) doesn't lose any cluster state. If ZooKeeper temporarily goes down, Kafka continues operating with its last-known state. When ZooKeeper recovers, the full state is restored.

ZooKeeper is also responsible for triggering partition leader election when a broker fails -- it notifies the controller broker, which then assigns new leaders for the failed broker's partitions.

KRaft: ZooKeeper's replacement

Apache Kafka is moving away from ZooKeeper with KRaft mode (Kafka Raft). In KRaft, a subset of Kafka brokers acts as the coordination layer using the Raft consensus protocol, eliminating the external ZooKeeper dependency. The metadata and coordination concepts remain the same -- only the implementation changes.

Quiz

What would happen if ZooKeeper permanently lost its data (broker registry, partition leaders, topic configs) and could not recover?

Kafka would continue normally since brokers cache all metadata locally.

Producers and consumers would continue with existing connections, but no new topic creation, no partition leader elections, and no failover would be possible -- the cluster would be frozen and unable to recover from any broker failure.

Kafka would automatically rebuild all metadata from the brokers' local state.

Only consumer offset tracking would be affected; everything else would work fine.

What ZooKeeper stores for Kafka​

How producers find the partition leader​

Fault tolerance​

What ZooKeeper stores for Kafka

How producers find the partition leader

Fault tolerance