Database

Where does Chubby actually store its data? The answer changed over time -- and the migration story reveals important lessons about dependency management in distributed systems.

Think first

Chubby initially used Berkeley DB (a third-party database) for storage. What risks does depending on an external storage engine create for a critical infrastructure service like Chubby?

Evolution of Chubby's storage

Chubby initially used a replicated version of Berkeley DB. The Chubby team eventually replaced it with a custom-built database, citing the risk of depending on a third-party storage engine for such a critical service.

Property	Berkeley DB (original)	Custom DB (replacement)
Data model	B-tree based	Simple key/value store
Transaction support	Full ACID transactions	Atomic operations only (no general transactions)
Replication	Berkeley DB's built-in replication	Database log distributed via Paxos
Durability	Write-ahead logging	Write-ahead logging + snapshotting
Maintenance risk	External dependency	Fully controlled by the Chubby team

Interview angle

Chubby's migration from Berkeley DB to a custom store illustrates a recurring theme: critical infrastructure eventually needs to own its dependencies. The same reasoning drove Google to build Colossus (replacing GFS's single-master design) and Spanner (replacing ad-hoc sharding). When interviewing, mention this trade-off: external dependencies reduce initial effort but create long-term risk for foundational services.

Backup strategy

Chubby uses a Write-ahead Log (WAL) for durability. Every database transaction is recorded in the log before being applied. To prevent unbounded log growth:

Every few hours, the master writes a snapshot of its database to a GFS server in a different building.
After a successful snapshot, the previous transaction log is deleted.
At any point, the complete system state = last snapshot + subsequent transaction log entries.

Backup concern	How Chubby addresses it
Building-level failures	Snapshots stored in a separate building
Cyclic dependencies	GFS cell in the same building might depend on this Chubby cell for master election -- so backups go to a different building's GFS
Replica initialization	New replicas bootstrap from backup snapshots instead of loading from other replicas
Disaster recovery	Backups enable full state reconstruction from scratch

warning

The cyclic dependency concern is subtle but critical. Chubby elects leaders for GFS and BigTable. If Chubby backed up to a GFS cell that depended on the same Chubby cell, a failure could create a deadlock where neither system can recover. Always map your dependency graph when designing backup strategies.

Mirroring

Chubby supports automatic file mirroring across cells -- copying collections of files from one cell to another.

Mirroring property	Detail
Speed	Changes reflected in dozens of mirrors worldwide in under a second (files are small)
Trigger	Event mechanism notifies mirrors immediately on file add/delete/modify
Network partitions	Unreachable mirrors remain unchanged; on reconnection, checksums identify stale files
Primary use case	Distributing configuration files to computing clusters worldwide

Global cell

A special global cell has replicas distributed across widely separated geographic locations. It mirrors a subtree (/ls/global/master) to every other Chubby cell (/ls/cell/replica).

The global cell stores:

Chubby's own ACLs
Files where Chubby cells and other systems advertise their presence to monitoring services
Pointers to large data sets (e.g., BigTable cells) and configuration files for other systems

Quiz

Chubby backs up its database snapshots to a GFS server in a different building. What would happen if it backed up to a GFS server in the same building instead?

Backup reliability would improve because the data stays closer to the source.

A building-level failure could create a deadlock: the local GFS cell might depend on this Chubby cell for master election, and Chubby cannot restore its state without accessing GFS -- making both systems unrecoverable from the same failure event.

There would be no difference because GFS replicates data across racks within a building.

Chubby's performance would improve because backup writes would be faster.

Evolution of Chubby's storage​

Backup strategy​

Mirroring​

Global cell​

Evolution of Chubby's storage

Backup strategy

Mirroring

Global cell