Skip to main content

Summary: Chubby

The big picture

Chubby solves one of the most fundamental problems in distributed computing: how do distributed nodes agree on anything? Leader election, configuration management, service discovery, distributed locking -- all of these reduce to consensus, and consensus is hard.

Chubby's contribution isn't algorithmic (it uses Paxos, which was already well-known). Its contribution is practical: wrap a complex consensus algorithm in a simple, familiar interface that looks like a file system with locks. This made distributed consensus accessible to every team at Google without requiring them to implement Paxos themselves.

The same insight drove Apache ZooKeeper, Chubby's open-source spiritual successor.

Architecture at a glance

ComponentRole
Chubby cellA cluster of typically 5 replicas, one of which is the master
MasterHandles all reads and writes; elected via Paxos
ReplicasParticipate in Paxos consensus; take over if master fails
ClientsMaintain sessions with the master via KeepAlive (lease) messages

Key property: All writes go through Paxos (majority must agree). Reads are served by the master alone (fast, since the master is always up-to-date).

How Chubby uses system design patterns

ProblemPatternHow Chubby uses it
Surviving master crashesWrite-ahead LogAll transactions logged before being applied
Ensuring write consistencyQuorumWrites require majority acknowledgment (Paxos)
Preventing zombie mastersSplit-brain (Epoch number)Each new master gets a higher epoch; old master's requests are rejected
Managing client sessionsLeaseTime-bound session leases; expired sessions lose all locks and cached data
Preventing stale master accessFencingLease expiry acts as soft fencing -- old master loses authority

Chubby's four roles

RoleHow Chubby serves itWho uses it
Leader electionNodes compete for a lock; winner is leaderGFS master, BigTable master
Naming serviceConsistent, instantly-updated file hierarchy replaces DNSGoogle-internal service discovery
Metadata storageUnix-style small file storageBigTable schema, GFS metadata, ACLs
Distributed lockingCoarse-grained locks with sequencersCross-service coordination
Why coarse-grained?

Chubby is designed for locks held for hours or days (like "who is the leader"), not milliseconds (like "lock this database row"). Fine-grained locking at Chubby's scale would generate too much traffic and latency. If you need fine-grained locks, use a different mechanism closer to the data.

Quick reference card

PropertyValue
TypeDistributed coordination / lock service
CAP classificationCP -- linearizable consistency via Paxos
Architecture5-replica cell, single master
ConsistencyLinearizable (reads always see latest write)
Data sizeSmall objects only (kilobytes)
Lock granularityCoarse-grained (held for long periods)
Session managementTime-bound leases with KeepAlive renewals
Consensus algorithmPaxos
Open-source equivalentApache ZooKeeper
Design Challenge

Design a service registry for microservices

You need to design a service registry for a platform with 500 microservices. Each service must discover the endpoints of services it depends on. Stateful services (databases, caches) need leader election. Small configuration data (feature flags, routing rules) must be stored consistently and pushed to services in real time.
Hints (0/4)

References and further reading