High-level Architecture

How do you store a multi-terabyte file across thousands of commodity machines that are constantly failing? GFS answers this with a three-component architecture: a single master, many ChunkServers, and clients that talk to both.

Think first

If you had to store a 1TB file across many commodity servers, what fundamental problems would you need to solve?

Chunks

GFS splits every file into fixed-size chunks of 64MB. This is orders of magnitude larger than a typical filesystem block (4KB), and the choice is deliberate -- large chunks mean fewer chunks per file, less metadata, and fewer round-trips to the master.

Chunk handles

Each chunk gets an immutable, globally unique 64-bit chunk handle assigned by the master at creation time. With 64-bit IDs and 64MB chunks, the address space can reference more than an exabyte of data.

Because files are split into chunks, GFS's core job is maintaining a mapping from files to chunks and translating file operations into operations on individual chunks.

Cluster topology

A GFS cluster contains exactly three kinds of entities:

Component	Count	Responsibility
Master	1	Stores all metadata -- namespace, file-to-chunk mappings, chunk locations, access control
ChunkServers	Many	Store chunks as regular Linux files on local disks; serve read/write requests
Clients	Many	Application-linked library that coordinates with master (metadata) and ChunkServers (data)

Interview angle

The separation of control flow (client to master) from data flow (client to ChunkServer) is the single most important architectural decision in GFS. It keeps the master lightweight and prevents it from becoming a bandwidth bottleneck. When you design any distributed storage system in an interview, call out this separation explicitly.

ChunkServers

ChunkServers store chunks on local disks as plain Linux files and handle reads/writes specified by chunk handle and byte range. For reliability, each chunk is replicated to multiple ChunkServers -- three replicas by default, though per-file replication factors are configurable.

warning

Don't confuse chunk replication with erasure coding. GFS uses full replication (3 complete copies), which triples storage cost but keeps the read/recovery path simple. Erasure coding reduces overhead but adds computational complexity -- a tradeoff GFS deliberately avoided to optimize for sequential throughput over storage efficiency.

Master

The master coordinates the entire cluster. Its responsibilities include:

Metadata management -- file/directory names, file-to-chunk mappings, chunk locations, and access control
System-wide coordination -- chunk lease management, garbage collection of orphaned chunks, and chunk migration between ChunkServers
Health monitoring -- periodic HeartBeat exchanges with every ChunkServer to collect state and issue instructions
In-memory metadata -- the entire namespace and all mappings live in main memory for fast random access
Durability via operation log -- all metadata changes are written to a persistent write-ahead log, replicated to remote machines. On crash recovery, the master replays this log to reconstruct state
Fault tolerance -- the master replicates its state to remote machines so it can be restored quickly after failure
Global optimization -- a single centralized master has a global view, enabling optimal decisions about chunk placement and load balancing

Think first

Why does GFS have clients talk directly to ChunkServers for data, instead of routing everything through the master?

Client

The GFS client library links into every application that uses GFS. It handles metadata operations (create, delete, lookup) by talking to the master, and data operations (read, write) by talking directly to ChunkServers.

Neither clients nor ChunkServers cache file data:

Component	Caches data?	Why not?
Client	No	Workloads stream through huge files; working sets are too large to fit in cache
ChunkServer	No (relies on Linux buffer cache)	The OS already caches frequently accessed data in memory

This "no application-level data caching" design eliminates cache coherence problems entirely -- a significant simplification compared to systems like HDFS, which later adopted a similar approach.

Quiz

What would happen if the GFS master stored data AND metadata, instead of separating the control and data planes?

The system would be more consistent because all operations go through one server

The master would become a bandwidth bottleneck since all file data would flow through it, limiting total cluster throughput

Clients would benefit from lower latency due to fewer network hops

ChunkServers would become unnecessary

Chunks​

Chunk handles​

Cluster topology​

ChunkServers​

Master​

Client​