Skip to main content

High-level Architecture

How does a single lock service handle coordination for thousands of distributed processes without becoming a bottleneck itself? Chubby's architecture answers this with a compact, replicated design built on top of Paxos consensus.

Think first
You need to build a coordination service for thousands of distributed processes. Should it use an even or odd number of replicas, and why?

Core terminology

TermDefinition
Chubby cellA Chubby cluster, typically 5 replicas in the same data center (though some cells span thousands of kilometers)
MasterThe single replica elected via Paxos that handles all client requests
ReplicaA server in the cell that participates in consensus and stores a copy of the data
Client libraryAn application-linked library that communicates with the master via RPC

Server architecture

  • A Chubby cell consists of a small set of servers (typically 5) known as replicas.
  • One replica is elected master using Paxos -- it handles all client requests.
  • If the master fails, another replica takes over (see Split-brain for how epoch numbers prevent conflicts).
  • The master writes directly to its local database, then syncs asynchronously to all replicas -- ensuring data reliability even during master failover.
  • Replicas are placed on different racks for fault tolerance.

Client architecture

Client applications communicate with the master through a Chubby client library using RPC.

Interview angle

When asked "how would you implement distributed coordination?", sketch Chubby's architecture: a small odd-numbered replica set (5 nodes), Paxos-elected master, clients talk only to the master. This is the same architecture ZooKeeper uses and the foundation for any coordination service answer.

Chubby APIs

Chubby exports a file system interface similar to POSIX, but much simpler. The namespace is a strict tree of files and directories with slash-separated paths:

Path format: /ls/chubby_cell/directory_name/.../file_name

  • /ls designates the lock service (Chubby system)
  • chubby_cell identifies the specific Chubby instance
  • The remainder is a series of directories ending in a file name
  • The special name /ls/local resolves to the nearest cell relative to the caller

Chubby was originally a pure lock service, but its creators discovered that associating small amounts of data with each lock entity was extremely useful. Each entity can now serve as a lock, a small file, or both.

API groups

GroupAPIDescription
GeneralOpen()Opens a named file or directory, returns a handle
Close()Closes an open handle
Poison()Cancels all Chubby calls by other threads without deallocating memory
Delete()Deletes the file or directory
FileGetContentsAndStat()Atomically returns entire file contents + metadata (discourages large files)
GetStat()Returns metadata only
ReadDir()Returns names and metadata of all children in a directory
SetContents()Atomically writes entire file contents
SetACL()Writes new access control list information
LockingAcquire()Acquires a lock on a file (blocking)
TryAcquire()Non-blocking lock acquisition attempt
Release()Releases a lock
SequencerGetSequencer()Gets a string representation of a lock's state
SetSequencer()Associates a sequencer with a handle
CheckSequencer()Validates whether a sequencer is still current
warning

Chubby deliberately omits append, seek, move, symbolic links, and hard links. Files can only be read or written in their entirety. This is by design -- Chubby is a coordination service, not a file system. If you need partial reads/writes, use GFS or BigTable instead.

Quiz
Chubby routes all client requests through a single elected master. What would happen if Chubby instead allowed clients to read from any replica (not just the master)?