Skip to main content

Caching

Chubby's read-to-write ratio is heavily skewed toward reads. Without caching, every read would hit the master -- and at Google's scale (90,000+ clients per cell), that would be unsustainable. Chubby's caching strategy must deliver performance while maintaining strict consistency -- a much harder problem than caching in eventually consistent systems.

Think first
Chubby serves 90,000+ clients per cell, and reads vastly outnumber writes. How would you design a caching strategy that maintains strict consistency (every read sees the latest write) while reducing master load?

What gets cached?

Chubby clients maintain a consistent, write-through cache in the client's memory. The cache holds:

Cached itemBenefit
File contentsAvoids repeated reads of rarely-changing config files
Node metadataSaves round trips for ACL checks, generation numbers
Open handlesSubsequent Open() calls for previously-opened files skip the master
Absence of filesPrevents repeated lookups for files that don't exist

Cache entries are governed by a lease mechanism -- when the lease expires, the cache is flushed.

Cache invalidation protocol

When a write arrives, Chubby must invalidate every client's cached copy before applying the change. The protocol:

  1. Master receives a request to modify file contents or node metadata.
  2. Master blocks the modification and sends cache invalidation messages to all clients that have cached the affected data. (The master maintains a list of each client's cache contents.)
  3. Invalidations are piggybacked onto KeepAlive replies for efficiency.
  4. Each client flushes the cached entry and acknowledges the invalidation in its next KeepAlive call.
  5. Once all active clients acknowledge, the master applies the modification to its local database.
  6. The master replicates the update to other replicas via Paxos (quorum acknowledgment required).
  7. After majority acknowledgment from replicas, the master confirms the write to the originating client.
Interview angle

Chubby's invalidation protocol is a textbook example of write-through, invalidate-on-write caching in a distributed system. The key insight: the master blocks the write until all cached copies are invalidated. This guarantees linearizability but adds write latency. In interviews, contrast this with eventually consistent caches (like Dynamo) where writes proceed immediately but readers may see stale data.

Common questions

Q: Can clients read files while the master waits for invalidation acknowledgments?

Yes. During the invalidation window, the file is marked uncacheable. Clients can still read it (the read goes directly to the master) but will not cache the result. This ensures reads are never blocked -- important because reads vastly outnumber writes.

Q: Can clients cache locks?

Yes. A client may hold a lock longer than strictly necessary, anticipating reuse. This avoids redundant lock acquisition round trips.

Q: Can clients cache open handles?

Yes. If a client re-opens a previously opened file, only the first Open() call reaches the master. Subsequent opens are served from the handle cache.

warning

The invalidation protocol means write latency scales with the number of caching clients. A file cached by thousands of clients will take longer to update than one cached by ten. This is acceptable because Chubby is designed for rarely-changing data. If you find yourself writing frequently to Chubby, you are misusing the service -- consider Kafka or a database instead.

Quiz
A configuration file in Chubby is cached by 10,000 clients. What would happen if this file is updated 100 times per second?