4 Leader and Follower
In a peer-to-peer system, every node is equal -- and every node must coordinate with every other node. This makes consistency hard: with no central authority, who decides the order of operations? Who breaks ties? Who coordinates failover?
The leader-follower pattern offers a simpler alternative: elect one node to coordinate, and have everyone else follow its lead.
Background
Systems like Dynamo and Cassandra prove that peer-to-peer, leaderless architectures work well for eventually consistent systems. But for systems that need strong consistency -- where every read must see the latest write -- a leader simplifies things enormously.
With a leader:
- All writes go through a single node, which defines the order of operations
- Followers replicate the leader's log, guaranteeing they see the same sequence
- Reads can go to the leader (for strong consistency) or followers (for read scaling)
The trade-off is clear: the leader is a potential bottleneck and single point of failure. But for many systems, that trade-off is worth the simplicity.
Definition
Designate one node as the leader (also called primary or master). The leader handles all write operations and coordinates replication. Followers (also called secondaries or replicas) replicate the leader's data and can optionally serve read requests.
How it works
- Leader election: When the system starts or the current leader fails, nodes elect a new leader (often using a consensus protocol like Paxos or Raft, or a coordination service like Chubby/ZooKeeper)
- Write path: All writes go to the leader → leader writes to its write-ahead log → leader replicates to followers
- Read path: Reads can go to the leader (consistent) or followers (faster, potentially stale)
- Failover: If the leader fails, a follower is promoted. The new leader's generation number prevents the old leader from causing confusion.
Trade-offs
| Advantage | Disadvantage |
|---|---|
| Simple consistency model -- leader defines write order | Leader is a bottleneck for writes |
| Easy to reason about correctness | Single point of failure (requires failover mechanism) |
| Followers can serve reads for load distribution | Followers may serve stale data if replication is async |
| Well-established patterns for failover | Failover introduces risk of split-brain |
This is one of the most fundamental architecture decisions in distributed systems:
- Leader-follower (BigTable, GFS, HDFS, Kafka, Chubby): Simple consistency, but the leader is a potential bottleneck
- Peer-to-peer (Dynamo, Cassandra): No bottleneck, better write scalability, but consistency is much harder
Neither is universally better. The choice depends on whether your system prioritizes consistency or availability.
Examples
Kafka
Each Kafka partition has a designated leader broker responsible for all reads and writes to that partition. Followers replicate the leader's log. If the leader fails, one of the in-sync replicas (ISR) takes over. Additionally, the Kafka controller broker is a cluster-wide leader responsible for administrative operations (creating topics, assigning partition leaders, detecting broker failures).
GFS
The GFS master is the leader for all metadata operations. ChunkServers are followers that store data. The master doesn't handle data flow directly -- it only tells clients which ChunkServers to talk to. This keeps the leader from becoming a data bandwidth bottleneck.
HDFS
The NameNode is the leader, DataNodes are followers. HDFS adds a standby NameNode for high availability -- a hot spare that can take over if the active NameNode fails.
Chubby
Chubby elects a leader (master) using Paxos at startup. The master handles all reads and writes. If the master fails, Paxos elects a new one. Chubby's leadership mechanism is then used by other systems (GFS, BigTable) for their own leader election.
When designing a system in an interview, explicitly state whether you're using leader-follower or peer-to-peer, and explain why. If the interviewer asks "What happens when the leader fails?", walk through: detection (heartbeat timeout), election (Paxos/Raft/ZooKeeper), prevention of split-brain (generation numbers), and fencing of the old leader.