Skip to main content

File Directories and Handles

Every coordination primitive in Chubby -- locks, leader election tokens, configuration data -- maps onto a file-system abstraction. Understanding how Chubby models files, directories, and handles clarifies why its API is so compact.

Think first
In a distributed coordination service, you need a way for servers to automatically deregister when they crash. How would you design a mechanism that makes a server's registration disappear without requiring explicit cleanup?

File system structure

Chubby's namespace is a tree of nodes, where each node is either a file or a directory. Directories contain lists of child nodes.

Nodes

PropertyDetail
LockingAny node can act as an advisory reader/writer lock
Ephemeral nodesDeleted automatically when no client has them open (or when empty, for directories). Used as liveness indicators -- if the client that created an ephemeral node dies, the node disappears.
Permanent nodesPersist until explicitly deleted
Explicit deletionAny node (ephemeral or permanent) can be deleted by a client
Interview angle

Ephemeral nodes are the key mechanism behind service discovery and health checking in both Chubby and ZooKeeper. A server registers an ephemeral node on startup; if the server crashes, its session expires and the node vanishes -- automatically deregistering the server. No heartbeat protocol needed on the application side.

Metadata

Each node carries three categories of metadata:

Access Control Lists (ACLs)

  • Three ACL names per node: for reading, writing, and changing ACLs
  • Nodes inherit ACL names from their parent directory at creation time
  • ACL definitions are themselves files stored in a well-known ACL directory within the cell
  • Users are authenticated via the RPC system's built-in mechanism

Monotonically increasing 64-bit counters

These counters allow clients to detect changes without reading file contents:

CounterIncremented when...
Instance numberA new node replaces a previously deleted node with the same name (always higher than predecessor)
Content generation numberFile contents are written (files only)
Lock generation numberLock transitions from free to held
ACL generation numberACL names are written

Checksum

A 64-bit file-content checksum exposed to clients, enabling quick file comparison without reading full contents.

Handles

Opening a node returns a handle (analogous to a Unix file descriptor). Handles contain three components:

ComponentPurpose
Check digitsPrevent clients from forging or guessing handles; full access-control checks happen only at handle creation
Sequence numberLets the master distinguish handles it created from handles created by a previous master
Mode informationRecorded at open time; enables a new master to reconstruct state when an old handle is presented after failover
warning

Handles are tied to sessions. If a client's session expires (e.g., during a prolonged network partition beyond the grace period), all its handles become invalid -- and with them, all locks. Applications must handle this scenario gracefully, typically by re-acquiring locks and re-reading state after session recovery.

Quiz
What would happen if Chubby used permanent nodes instead of ephemeral nodes for tablet server registration in BigTable?