Criticism on GFS

GFS solved Google's storage problem in 2003. But as Google's data and workloads grew by orders of magnitude, the very design choices that made GFS simple started to break. Understanding these failure modes is just as important as understanding the original design -- they reveal the limits of specific architectural tradeoffs.

Think first

As Google grew by orders of magnitude, which GFS design decision do you think broke first -- the single master, the large chunk size, or the relaxed consistency model? Why?

Problems with the single master

The single master was GFS's most deliberate simplification, and it became its biggest liability at scale:

Problem	Root cause	Consequence
CPU bottleneck	All metadata operations route through one machine	As client count grows, the master cannot serve them all -- CPU saturates
Memory limit	All metadata must fit in master RAM	Despite the large chunk size keeping metadata compact, the sheer volume of data eventually outgrows what a single machine can hold

The separation of control and data flow helped, but it only delays the bottleneck -- it doesn't eliminate it. Every file open, every chunk lookup, every lease grant still hits the single master.

Interview angle

The single-master bottleneck is one of the most frequently discussed GFS limitations in interviews. Google's successor system, Colossus, addresses this by distributing metadata across multiple servers. HDFS tackled the same problem with HDFS Federation (multiple NameNodes, each managing a portion of the namespace). When designing a distributed file system in an interview, explain that a single master works at moderate scale but needs partitioning or federation at large scale. Reference the leader and follower pattern and discuss when a single leader stops being sufficient.

Problems with the large chunk size

The 64MB chunk size optimized for large sequential I/O but created problems for small files:

Problem	Why it happens	Workaround
Hotspots	A small file occupies one chunk on one set of ChunkServers. High read traffic concentrates on those servers.	Store small popular files with a higher replication factor
Wasted metadata overhead	A 1KB file still requires a full chunk entry in the master's metadata	Minimal -- metadata per chunk is only ~64 bytes
Staggered access	Even with extra replicas, burst traffic can overwhelm ChunkServers	GFS adds a random delay to application start times to spread load

warning

The hotspot problem is not unique to GFS. Any system that maps small objects to large fixed-size storage units risks uneven load. HDFS inherited this problem and recommends storing small files in archive formats (HAR files, SequenceFiles) rather than as individual entries. In your designs, consider whether your chunk/block size matches your expected file size distribution.

The broader lesson

GFS's limitations are not design flaws -- they are the natural consequences of optimizing for a specific workload at a specific scale. When that workload and scale changed, the design needed to evolve. This is the lifecycle of every distributed system: design for today's constraints, measure where the design breaks, and iterate.

Google's response was not to patch GFS, but to build Colossus -- a ground-up redesign that distributed metadata, supported smaller files, and scaled to exabytes. The lesson for interviews: every design has a shelf life. The mark of a good engineer is knowing when the current design will break, not just how it works today.

Quiz

If you were redesigning GFS to handle 1000x more files (many of them small), which change would have the most impact?

Increasing the chunk size to 256MB to further reduce metadata per chunk

Distributing the master's metadata across multiple servers so no single machine is a CPU or memory bottleneck

Switching from full replication to erasure coding to save storage

Removing the operation log to speed up metadata writes

Problems with the single master​

Problems with the large chunk size​

The broader lesson​

Problems with the single master

Problems with the large chunk size

The broader lesson