Master Election and Chubby Events
When a Chubby master dies, the cell must elect a new one and reconstruct state -- without losing client sessions or violating lock guarantees. The sequence of steps a new master follows is one of the most operationally critical paths in Chubby.
New master initialization sequence
A newly elected master proceeds through these steps in order:
| Step | Action | Purpose |
|---|---|---|
| 1 | Pick a new epoch number | Distinguishes this master from the previous one. Clients must present the epoch on every call; the master rejects calls with stale epochs. This prevents the new master from processing old packets meant for the previous master. Same concept as Split-brain epoch fencing. |
| 2 | Respond to master-location requests only | The master announces itself but does not yet handle session operations. |
| 3 | Rebuild in-memory state | Reconstruct session and lock data structures from the database. Extend session leases to the maximum the previous master may have used. |
| 4 | Allow KeepAlives only | Clients can maintain their sessions but cannot perform other operations yet. |
| 5 | Emit failover event to every session | Clients flush their caches (they may have missed invalidations) and receive warnings that other events may have been lost. |
| 6 | Wait for acknowledgments | The master waits until every session acknowledges the failover event or lets its session expire. |
| 7 | Allow all operations | Normal service resumes. |
| 8 | Honor pre-failover handles | If a client presents a handle created before the failover, the master reconstructs the in-memory handle representation and processes the request. |
| 9 | Delete stale ephemeral files | After ~1 minute, ephemeral files with no open handles are cleaned up. Clients must refresh ephemeral file handles within this window. |
The epoch number is the same concept as a fencing token for the master itself. When explaining Chubby's failover, emphasize: "Each new master gets a strictly increasing epoch. Any request carrying an old epoch is rejected. This is how Chubby prevents split-brain -- even if the old master is still alive, its requests are ignored."
Chubby events
Chubby provides a simple event mechanism. Clients subscribe to events when creating a handle, and events are delivered asynchronously via callbacks in the Chubby library.
File and lock events
| Event | Triggered when... |
|---|---|
| File contents modified | A file's data changes |
| Child node added/removed/modified | A directory's children change |
| Master failover | A new master takes over |
| Handle invalidated | A handle (and its associated lock) becomes invalid |
| Lock acquired | A lock transitions from free to held |
| Conflicting lock request | Another client requests a lock held by this client |
Session events (sent to the application)
| Event | Meaning |
|---|---|
| Jeopardy | Session lease timed out; grace period has begun. The client's cached data is no longer trustworthy. |
| Safe | Session survived a communication problem. Cache is re-enabled. |
| Expired | Session timed out. All handles, locks, and cached data are invalidated. |
Events can be lost during failover. The failover event (step 5 above) warns clients of this explicitly. Applications must treat the failover event as a signal to re-validate all assumptions -- re-read files, re-check lock ownership, and re-register for events. Treating the failover event as purely informational is a common mistake.