Skip to main content

13 Fencing

You've detected a split-brain situation -- a zombie leader is still running after a new leader was elected. Generation numbers ensure nodes will eventually reject the zombie's commands. But what about the window between the new leader taking over and the zombie learning it's been replaced? During that window, the zombie might still issue writes, modify shared storage, or corrupt data.

Fencing ensures the zombie can't do any damage during this dangerous transition period.

Think first
Generation numbers ensure followers reject stale commands. But what about the window between the new leader taking over and the zombie leader learning it's been replaced? During that window, the zombie can still write to shared storage. How do you close this gap?

Background

In a leader-follower setup, detecting that the old leader is stale (via generation numbers) isn't enough. Consider this timeline:

  1. Leader A (generation 1) is serving writes to shared storage
  2. Leader A becomes unresponsive (network partition or GC pause)
  3. The cluster elects Leader B (generation 2)
  4. Leader B starts writing to shared storage
  5. Leader A comes back, doesn't yet know it's been replaced, and writes to shared storage

Between steps 4 and 5, both leaders are writing to the same shared storage. Even though generation numbers will eventually resolve this, the concurrent writes during the overlap window can cause corruption.

Fencing closes this window by proactively blocking the old leader's access to shared resources.

Definition

Fencing puts a "fence" around the previously active leader, preventing it from accessing cluster resources and serving any read/write requests.

Fencing techniques

Resource fencing

Block the old leader's access to the specific resources it needs:

  • Revoke shared storage access -- Change NFS permissions or storage ACLs so the old leader's credentials no longer work
  • Disable network ports -- Use remote management commands to block the old leader's network access to critical services
  • Invalidate tokens -- If the storage system uses tokens/leases, revoke the old leader's token

The advantage: surgical and targeted. The old leader node stays running (useful for debugging) but can't affect shared state.

Node fencing (STONITH)

Stop the old leader entirely:

  • Power off the node -- Via IPMI, iLO, or cloud provider API
  • Force restart -- Hard reset the machine
  • Kill the process -- If the node is a VM or container, terminate it

This is the nuclear option, known as STONITH -- "Shoot The Other Node In The Head." It's aggressive but unambiguous: a powered-off node definitely can't issue conflicting commands.

TechniquePrecisionCertaintyRecovery
Resource fencingHigh (blocks specific access)Medium (process still runs, might find another path)Node stays up, can be re-added
Node fencing (STONITH)Low (kills everything)High (node is definitely stopped)Requires full restart
When to use which

Resource fencing is preferred when you can reliably enumerate all the resources the old leader needs. Node fencing is the fallback when you can't be sure you've blocked every path -- when in doubt, kill the node.

How fencing works with generation numbers

Fencing and split-brain detection are complementary:

  1. Generation numbers ensure that followers reject stale commands -- they solve the problem from the recipient side
  2. Fencing ensures the zombie leader can't issue commands -- it solves the problem from the sender side
  3. Together, they close both sides of the window: even if fencing is slightly delayed, generation numbers provide a safety net, and vice versa

Examples

HDFS

HDFS is the textbook example of fencing in production. When a standby NameNode takes over as the active NameNode:

  1. It uses STONITH to fence the previously active NameNode -- typically by SSHing to the machine and killing the NameNode process
  2. It revokes the old NameNode's access to the shared edit log storage (resource fencing)
  3. Only then does it begin serving as the new active NameNode

Without fencing, the old NameNode could write to the edit log concurrently with the new one, corrupting the file system's metadata.

Chubby / ZooKeeper

Chubby's lease mechanism is a form of time-based fencing. When a session lease expires, all locks held by that session are automatically released. The old leader loses its locks and therefore can't access the resources those locks protected. This is "soft fencing" -- it relies on all participants respecting the lease protocol.

Interview angle

Fencing comes up as a follow-up to split-brain: "OK, you have generation numbers. But what about the window before the zombie knows it's been replaced?" The answer: fence the old leader -- either block its resource access or kill the node. Always mention both the generation number (logical safety) and fencing (physical safety) as complementary defenses.

Quiz
HDFS uses STONITH (node fencing) to kill the old NameNode during failover. What would happen if the new NameNode started serving before the STONITH operation completed?