Skip to main content

High-level Architecture

How do you store a multi-terabyte file across thousands of commodity machines that are constantly failing? GFS answers this with a three-component architecture: a single master, many ChunkServers, and clients that talk to both.

Think first
If you had to store a 1TB file across many commodity servers, what fundamental problems would you need to solve?

Chunks

GFS splits every file into fixed-size chunks of 64MB. This is orders of magnitude larger than a typical filesystem block (4KB), and the choice is deliberate -- large chunks mean fewer chunks per file, less metadata, and fewer round-trips to the master.

Chunk handles

Each chunk gets an immutable, globally unique 64-bit chunk handle assigned by the master at creation time. With 64-bit IDs and 64MB chunks, the address space can reference more than an exabyte of data.

Because files are split into chunks, GFS's core job is maintaining a mapping from files to chunks and translating file operations into operations on individual chunks.

Cluster topology

A GFS cluster contains exactly three kinds of entities:

ComponentCountResponsibility
Master1Stores all metadata -- namespace, file-to-chunk mappings, chunk locations, access control
ChunkServersManyStore chunks as regular Linux files on local disks; serve read/write requests
ClientsManyApplication-linked library that coordinates with master (metadata) and ChunkServers (data)
Interview angle

The separation of control flow (client to master) from data flow (client to ChunkServer) is the single most important architectural decision in GFS. It keeps the master lightweight and prevents it from becoming a bandwidth bottleneck. When you design any distributed storage system in an interview, call out this separation explicitly.

ChunkServers

ChunkServers store chunks on local disks as plain Linux files and handle reads/writes specified by chunk handle and byte range. For reliability, each chunk is replicated to multiple ChunkServers -- three replicas by default, though per-file replication factors are configurable.

warning

Don't confuse chunk replication with erasure coding. GFS uses full replication (3 complete copies), which triples storage cost but keeps the read/recovery path simple. Erasure coding reduces overhead but adds computational complexity -- a tradeoff GFS deliberately avoided to optimize for sequential throughput over storage efficiency.

Master

The master coordinates the entire cluster. Its responsibilities include:

  1. Metadata management -- file/directory names, file-to-chunk mappings, chunk locations, and access control
  2. System-wide coordination -- chunk lease management, garbage collection of orphaned chunks, and chunk migration between ChunkServers
  3. Health monitoring -- periodic HeartBeat exchanges with every ChunkServer to collect state and issue instructions
  4. In-memory metadata -- the entire namespace and all mappings live in main memory for fast random access
  5. Durability via operation log -- all metadata changes are written to a persistent write-ahead log, replicated to remote machines. On crash recovery, the master replays this log to reconstruct state
  6. Fault tolerance -- the master replicates its state to remote machines so it can be restored quickly after failure
  7. Global optimization -- a single centralized master has a global view, enabling optimal decisions about chunk placement and load balancing
Think first
Why does GFS have clients talk directly to ChunkServers for data, instead of routing everything through the master?

Client

The GFS client library links into every application that uses GFS. It handles metadata operations (create, delete, lookup) by talking to the master, and data operations (read, write) by talking directly to ChunkServers.

Neither clients nor ChunkServers cache file data:

ComponentCaches data?Why not?
ClientNoWorkloads stream through huge files; working sets are too large to fit in cache
ChunkServerNo (relies on Linux buffer cache)The OS already caches frequently accessed data in memory

This "no application-level data caching" design eliminates cache coherence problems entirely -- a significant simplification compared to systems like HDFS, which later adopted a similar approach.

Quiz
What would happen if the GFS master stored data AND metadata, instead of separating the control and data planes?