High-level Architecture
How do you store a multi-terabyte file across thousands of commodity machines that are constantly failing? GFS answers this with a three-component architecture: a single master, many ChunkServers, and clients that talk to both.
Chunks
GFS splits every file into fixed-size chunks of 64MB. This is orders of magnitude larger than a typical filesystem block (4KB), and the choice is deliberate -- large chunks mean fewer chunks per file, less metadata, and fewer round-trips to the master.
Chunk handles
Each chunk gets an immutable, globally unique 64-bit chunk handle assigned by the master at creation time. With 64-bit IDs and 64MB chunks, the address space can reference more than an exabyte of data.
Because files are split into chunks, GFS's core job is maintaining a mapping from files to chunks and translating file operations into operations on individual chunks.
Cluster topology
A GFS cluster contains exactly three kinds of entities:
| Component | Count | Responsibility |
|---|---|---|
| Master | 1 | Stores all metadata -- namespace, file-to-chunk mappings, chunk locations, access control |
| ChunkServers | Many | Store chunks as regular Linux files on local disks; serve read/write requests |
| Clients | Many | Application-linked library that coordinates with master (metadata) and ChunkServers (data) |
The separation of control flow (client to master) from data flow (client to ChunkServer) is the single most important architectural decision in GFS. It keeps the master lightweight and prevents it from becoming a bandwidth bottleneck. When you design any distributed storage system in an interview, call out this separation explicitly.
ChunkServers
ChunkServers store chunks on local disks as plain Linux files and handle reads/writes specified by chunk handle and byte range. For reliability, each chunk is replicated to multiple ChunkServers -- three replicas by default, though per-file replication factors are configurable.
Don't confuse chunk replication with erasure coding. GFS uses full replication (3 complete copies), which triples storage cost but keeps the read/recovery path simple. Erasure coding reduces overhead but adds computational complexity -- a tradeoff GFS deliberately avoided to optimize for sequential throughput over storage efficiency.
Master
The master coordinates the entire cluster. Its responsibilities include:
- Metadata management -- file/directory names, file-to-chunk mappings, chunk locations, and access control
- System-wide coordination -- chunk lease management, garbage collection of orphaned chunks, and chunk migration between ChunkServers
- Health monitoring -- periodic HeartBeat exchanges with every ChunkServer to collect state and issue instructions
- In-memory metadata -- the entire namespace and all mappings live in main memory for fast random access
- Durability via operation log -- all metadata changes are written to a persistent write-ahead log, replicated to remote machines. On crash recovery, the master replays this log to reconstruct state
- Fault tolerance -- the master replicates its state to remote machines so it can be restored quickly after failure
- Global optimization -- a single centralized master has a global view, enabling optimal decisions about chunk placement and load balancing
Client
The GFS client library links into every application that uses GFS. It handles metadata operations (create, delete, lookup) by talking to the master, and data operations (read, write) by talking directly to ChunkServers.
Neither clients nor ChunkServers cache file data:
| Component | Caches data? | Why not? |
|---|---|---|
| Client | No | Workloads stream through huge files; working sets are too large to fit in cache |
| ChunkServer | No (relies on Linux buffer cache) | The OS already caches frequently accessed data in memory |
This "no application-level data caching" design eliminates cache coherence problems entirely -- a significant simplification compared to systems like HDFS, which later adopted a similar approach.