Summary: GFS
The big picture
GFS is a masterclass in designing for your actual workload. Google looked at how their applications actually used storage -- large sequential reads, append-heavy writes, constant hardware failures -- and built a file system that optimized ruthlessly for those patterns, discarding decades of file system conventions (POSIX compliance, small-block random access, strong consistency everywhere) that didn't serve them.
The result: a system that could store and serve petabytes on commodity hardware with high throughput and reasonable fault tolerance. GFS became the foundation that Google's entire data infrastructure was built upon -- BigTable, MapReduce, and indirectly HDFS and the entire Hadoop ecosystem.
Architecture at a glance
| Component | Role |
|---|---|
| Master | Single node that manages all metadata (namespace, chunk locations, access control). Metadata lives in memory for speed. |
| ChunkServers | Store 64MB chunks as Linux files on local disks. Each chunk replicated 3x. |
| Clients | Application code that talks to the master for metadata and directly to ChunkServers for data. |
Key design choice: The master never handles data flow -- it only directs clients to the right ChunkServers. This prevents the single master from becoming a bandwidth bottleneck.
How GFS uses system design patterns
| Problem | Pattern | How GFS uses it |
|---|---|---|
| Surviving master crashes | Write-ahead Log | All metadata changes written to an operation log, replicated to remote machines |
| Monitoring ChunkServers | Heartbeat | Master sends periodic heartbeats to collect state and issue instructions |
| Detecting data corruption | Checksum | 32-bit checksum per 64KB block within each chunk; verified on reads |
| Coordinating chunk writes | Lease | Master grants 60-second mutation leases to primary replicas |
| One master coordinating all | Leader and Follower | Single master (leader) directs all coordination; ChunkServers (followers) store data |
GFS consistency model
GFS has a relaxed consistency model -- different operations provide different guarantees:
| Operation | Guarantee |
|---|---|
| Record append | At-least-once, atomic -- data may be written more than once, but each append is atomic |
| Random write | Undefined after concurrent writes -- replicas may differ |
| File namespace operations | Strongly consistent -- protected by the master's locking |
GFS chose at-least-once over exactly-once for appends because it dramatically simplifies the write path. Duplicate records are handled at the application level using checksums and sequence numbers in the data itself. This pushes complexity from the infrastructure to the application, which has more context to handle it correctly.
Quick reference card
| Property | Value |
|---|---|
| Type | Distributed file system |
| CAP classification | CP (relaxed in practice) |
| Architecture | Single master + multiple ChunkServers |
| Chunk size | 64MB |
| Replication | 3 replicas per chunk (default) |
| Consistency | Relaxed -- at-least-once for appends, undefined for concurrent writes |
| Optimized for | Large sequential reads/writes, append-heavy workloads |
| Metadata | In-memory on master, persisted via write-ahead log + checkpoints |
| Client caching | Metadata only (not file data) |
| Open source | No (internal Google; HDFS is the open-source equivalent) |
GFS's limitations and legacy
- Single master is a bottleneck for metadata operations and a single point of failure (mitigated by checkpointing and log replication, but not eliminated)
- Relaxed consistency requires applications to handle duplicates and inconsistencies
- Google eventually replaced GFS with Colossus, a next-generation distributed file system that addresses the single-master limitation
Despite its limitations, GFS's influence is enormous. Its core ideas live on in HDFS, and its design philosophy -- study your workload, then build the simplest system that serves it -- remains the gold standard for distributed systems design.
Design a video transcoding pipeline storage layer
References and further reading
- GFS paper -- the original 2003 paper
- GFS: Evolution on Fast-forward -- retrospective
- BigTable paper -- GFS's most important consumer