Locks Sequencers and Lock-delays
A distributed lock by itself is not enough. What happens when a lock holder crashes, the lock is reassigned, but the old holder's in-flight messages still arrive at downstream servers? This is the stale leader (or "zombie leader") problem, and Chubby solves it with two mechanisms: sequencers and lock-delay.
Lock modes
Each Chubby node can act as a reader-writer lock in one of two modes:
| Mode | Behavior |
|---|---|
| Exclusive (write) | Exactly one client holds the lock |
| Shared (read) | Any number of clients hold the lock simultaneously |
Sequencers
After acquiring a lock, a client can request a sequencer -- an opaque byte string that captures the lock's state:
Sequencer = Lock name + Lock mode (exclusive or shared) + Lock generation number
The workflow:
- Application master acquires a Chubby lock and obtains a sequencer.
- Master attaches the sequencer to every command it sends to worker servers.
- Workers validate the sequencer with Chubby before executing the command.
- If the sequencer belongs to a stale master (lock generation is outdated), Chubby rejects it.
This is the same concept as Fencing tokens -- a monotonically increasing token that lets downstream services reject commands from superseded leaders.
When discussing leader election in interviews, always mention fencing. "What if the old leader doesn't know it's been replaced?" is a classic follow-up. The answer: every command carries a sequencer (fencing token). Downstream services check the token and reject stale commands. Reference Chubby's sequencer or ZooKeeper's zxid as concrete examples.
Lock-delay
Not all servers support sequencer validation. For these legacy systems, Chubby provides lock-delay: a grace period during which a freed lock cannot be re-acquired by a different client.
| Scenario | Behavior |
|---|---|
| Normal release | Lock is immediately available to other clients |
| Holder fails or becomes unreachable | Lock server prevents others from claiming the lock for the lock-delay period |
Key details:
- Clients can specify any lock-delay up to an upper bound (default: one minute).
- The upper bound prevents a faulty client from making a resource unavailable indefinitely.
- Lock-delay is imperfect -- it relies on timing assumptions rather than logical ordering -- but it protects unmodified servers from everyday problems caused by message delays and restarts.
Lock-delay is a best-effort safeguard. It handles common cases (message reordering, slow restarts) but cannot prevent all split-brain scenarios. If your system can support sequencer validation, always prefer sequencers over lock-delay. In interview answers, present sequencers as the primary solution and lock-delay as the fallback for legacy integration.