Skip to main content

Single Master and Large Chunk Size

Two of GFS's most consequential design decisions -- a single master and a 64MB chunk size -- were controversial even at the time. Both trade flexibility for simplicity, and understanding why those tradeoffs work reveals a lot about practical distributed systems design.

Think first
How could you keep a single master design without making it a performance bottleneck? What kind of operations should be kept off the master?

Single master

A single master simplifies everything: it has a global view of the cluster and can make optimal chunk placement and replication decisions without distributed consensus. The risk is obvious -- it could become a bottleneck.

GFS mitigates this by keeping the master off the data path. Clients never read or write file data through the master. The interaction looks like this:

  1. Client asks the master: "Which ChunkServers hold chunk X?"
  2. Master replies with ChunkServer locations
  3. Client caches this metadata (with a timeout) and talks directly to ChunkServers for all subsequent data operations

The master handles only metadata -- the lightweight control plane. All heavy data transfer happens on the separate data plane between clients and ChunkServers.

Interview angle

If an interviewer asks "Why not use multiple masters?", the answer is: GFS chose simplicity. A single master avoids the need for distributed consensus (like Paxos or Raft) on metadata operations. The tradeoff is a potential bottleneck, which Google eventually hit at scale -- leading to Colossus. In an interview, acknowledge the tradeoff and explain how you'd mitigate it (caching, batching, keeping the master off the data path).

Think first
If you used 4KB blocks instead of 64MB chunks, how would that affect the master's memory usage and the number of client-to-master interactions?

Chunk size

64MB per chunk is roughly 16,000x larger than a typical filesystem block (4KB). This choice has cascading benefits:

BenefitExplanation
Less metadataFewer chunks per file means the master stores less data, keeping everything in memory
Fewer master interactionsClients cache chunk locations; with large chunks, a single lookup covers more data
Long-lived TCP connectionsA client reading a large chunk can reuse the same connection, amortizing TCP setup cost
Simpler ChunkServer managementFewer chunks to track means easier capacity monitoring and load balancing
Efficient sequential I/OLarge chunks align perfectly with the large, sequential reads and appends that dominate GFS workloads

Lazy space allocation

GFS does not allocate the full 64MB on disk when creating a chunk. Instead, the ChunkServer lazily extends the chunk as the client appends data. This avoids internal fragmentation -- if a chunk only holds 20MB of data, only 20MB of disk is consumed.

warning

Large chunk size creates a hotspot problem for small files. A file smaller than 64MB occupies a single chunk, and if many clients read that file, the ChunkServers hosting that one chunk become overloaded. GFS handles this with two workarounds:

  • Store small, popular files with a higher replication factor
  • Add a random delay to application start times to stagger access

This hotspot problem is worth mentioning in interviews -- it shows you understand that design decisions have downsides.

Quiz
What would happen if GFS reduced its chunk size from 64MB to 4KB (typical filesystem block size) to eliminate the small-file hotspot problem?