Skip to main content

Summary: GFS

The big picture

GFS is a masterclass in designing for your actual workload. Google looked at how their applications actually used storage -- large sequential reads, append-heavy writes, constant hardware failures -- and built a file system that optimized ruthlessly for those patterns, discarding decades of file system conventions (POSIX compliance, small-block random access, strong consistency everywhere) that didn't serve them.

The result: a system that could store and serve petabytes on commodity hardware with high throughput and reasonable fault tolerance. GFS became the foundation that Google's entire data infrastructure was built upon -- BigTable, MapReduce, and indirectly HDFS and the entire Hadoop ecosystem.

Architecture at a glance

ComponentRole
MasterSingle node that manages all metadata (namespace, chunk locations, access control). Metadata lives in memory for speed.
ChunkServersStore 64MB chunks as Linux files on local disks. Each chunk replicated 3x.
ClientsApplication code that talks to the master for metadata and directly to ChunkServers for data.

Key design choice: The master never handles data flow -- it only directs clients to the right ChunkServers. This prevents the single master from becoming a bandwidth bottleneck.

How GFS uses system design patterns

ProblemPatternHow GFS uses it
Surviving master crashesWrite-ahead LogAll metadata changes written to an operation log, replicated to remote machines
Monitoring ChunkServersHeartbeatMaster sends periodic heartbeats to collect state and issue instructions
Detecting data corruptionChecksum32-bit checksum per 64KB block within each chunk; verified on reads
Coordinating chunk writesLeaseMaster grants 60-second mutation leases to primary replicas
One master coordinating allLeader and FollowerSingle master (leader) directs all coordination; ChunkServers (followers) store data

GFS consistency model

GFS has a relaxed consistency model -- different operations provide different guarantees:

OperationGuarantee
Record appendAt-least-once, atomic -- data may be written more than once, but each append is atomic
Random writeUndefined after concurrent writes -- replicas may differ
File namespace operationsStrongly consistent -- protected by the master's locking
Why "at-least-once"?

GFS chose at-least-once over exactly-once for appends because it dramatically simplifies the write path. Duplicate records are handled at the application level using checksums and sequence numbers in the data itself. This pushes complexity from the infrastructure to the application, which has more context to handle it correctly.

Quick reference card

PropertyValue
TypeDistributed file system
CAP classificationCP (relaxed in practice)
ArchitectureSingle master + multiple ChunkServers
Chunk size64MB
Replication3 replicas per chunk (default)
ConsistencyRelaxed -- at-least-once for appends, undefined for concurrent writes
Optimized forLarge sequential reads/writes, append-heavy workloads
MetadataIn-memory on master, persisted via write-ahead log + checkpoints
Client cachingMetadata only (not file data)
Open sourceNo (internal Google; HDFS is the open-source equivalent)

GFS's limitations and legacy

  • Single master is a bottleneck for metadata operations and a single point of failure (mitigated by checkpointing and log replication, but not eliminated)
  • Relaxed consistency requires applications to handle duplicates and inconsistencies
  • Google eventually replaced GFS with Colossus, a next-generation distributed file system that addresses the single-master limitation

Despite its limitations, GFS's influence is enormous. Its core ideas live on in HDFS, and its design philosophy -- study your workload, then build the simplest system that serves it -- remains the gold standard for distributed systems design.

Design Challenge

Design a video transcoding pipeline storage layer

You need to design the storage layer for a video transcoding service. Users upload raw video files (1-50 GB each). Each upload is written once, then read by multiple transcoding workers that produce different resolutions. The system must survive machine failures without data loss.
Hints (0/4)

References and further reading