File Directories and Handles
Every coordination primitive in Chubby -- locks, leader election tokens, configuration data -- maps onto a file-system abstraction. Understanding how Chubby models files, directories, and handles clarifies why its API is so compact.
File system structure
Chubby's namespace is a tree of nodes, where each node is either a file or a directory. Directories contain lists of child nodes.
Nodes
| Property | Detail |
|---|---|
| Locking | Any node can act as an advisory reader/writer lock |
| Ephemeral nodes | Deleted automatically when no client has them open (or when empty, for directories). Used as liveness indicators -- if the client that created an ephemeral node dies, the node disappears. |
| Permanent nodes | Persist until explicitly deleted |
| Explicit deletion | Any node (ephemeral or permanent) can be deleted by a client |
Ephemeral nodes are the key mechanism behind service discovery and health checking in both Chubby and ZooKeeper. A server registers an ephemeral node on startup; if the server crashes, its session expires and the node vanishes -- automatically deregistering the server. No heartbeat protocol needed on the application side.
Metadata
Each node carries three categories of metadata:
Access Control Lists (ACLs)
- Three ACL names per node: for reading, writing, and changing ACLs
- Nodes inherit ACL names from their parent directory at creation time
- ACL definitions are themselves files stored in a well-known ACL directory within the cell
- Users are authenticated via the RPC system's built-in mechanism
Monotonically increasing 64-bit counters
These counters allow clients to detect changes without reading file contents:
| Counter | Incremented when... |
|---|---|
| Instance number | A new node replaces a previously deleted node with the same name (always higher than predecessor) |
| Content generation number | File contents are written (files only) |
| Lock generation number | Lock transitions from free to held |
| ACL generation number | ACL names are written |
Checksum
A 64-bit file-content checksum exposed to clients, enabling quick file comparison without reading full contents.
Handles
Opening a node returns a handle (analogous to a Unix file descriptor). Handles contain three components:
| Component | Purpose |
|---|---|
| Check digits | Prevent clients from forging or guessing handles; full access-control checks happen only at handle creation |
| Sequence number | Lets the master distinguish handles it created from handles created by a previous master |
| Mode information | Recorded at open time; enables a new master to reconstruct state when an old handle is presented after failover |
Handles are tied to sessions. If a client's session expires (e.g., during a prolonged network partition beyond the grace period), all its handles become invalid -- and with them, all locks. Applications must handle this scenario gracefully, typically by re-acquiring locks and re-reading state after session recovery.