GFS Consistency Model and Snapshotting
Every distributed storage system must answer: what guarantees do clients get about the data they read? GFS's answer is deliberately relaxed -- it trades strong consistency for simplicity and throughput, pushing edge-case handling to applications.
GFS consistency model
Metadata consistency
Metadata operations (file creation, deletion, renaming) are strongly consistent. They are handled exclusively by the single master, serialized through namespace locking, and ordered by the master's operation log.
Data mutation consistency
Data mutations are where things get nuanced. The guarantees differ by operation type:
| Operation | Concurrent behavior | Consistency guarantee |
|---|---|---|
| Write (client-specified offset) | Concurrent writes to the same region are not serialized | Undefined -- region may contain mixed fragments from multiple clients |
| Record append | GFS picks the offset; each append is atomic | At-least-once, atomic -- data may appear more than once, but each copy is a contiguous byte sequence |
| Namespace operations | Serialized by master locks | Strongly consistent |
The key distinction: for writes, the client specifies the offset, so concurrent writes can collide. For appends, the system chooses the offset, so it can serialize them without coordination among clients.
GFS does not guarantee that all replicas of a chunk are byte-wise identical. After failed mutations and retries, replicas may contain different data (padding, duplicates, or partial records in different positions). Applications must use checksums and record IDs to validate data on read. This is the price of relaxed consistency.
When discussing consistency models in interviews, use GFS as a concrete example of a system that intentionally weakens consistency for performance. Compare it with systems that make different tradeoffs:
- HDFS follows a similar model (single-writer, append-only) but enforces stricter replica consistency
- Dynamo and Cassandra use eventual consistency with conflict resolution
- Chubby/ZooKeeper provide strong consistency via consensus, but for small amounts of metadata
The CAP theorem and PACELC theorem provide the theoretical framework for these tradeoffs.
Snapshotting
GFS supports snapshot -- an efficient copy of a file or directory subtree at a point in time. Snapshots enable cheap branching of large datasets for checkpointing or experimentation.
Copy-on-write design
Snapshots are initially zero-copy. No data is duplicated until someone tries to modify a snapshotted chunk. The mechanism:
- Revoke leases: The master revokes all outstanding leases on chunks covered by the snapshot and waits for them to expire
- Log the operation: The master writes the snapshot to the operation log for durability
- Duplicate metadata only: The master duplicates the metadata (namespace entries and file-to-chunk mappings) for the source tree. The new snapshot files point to the original chunks -- no data is copied
- Copy on first write: When a client later writes to one of these shared chunks, the master detects it via a reference count > 1. It then:
- Instructs each ChunkServer holding the chunk to make a local copy (no network transfer needed)
- Issues a lease for the new copy
- The write proceeds on the new copy, leaving the original chunk untouched
This copy-on-write scheme means snapshots of multi-terabyte directory trees complete almost instantly -- the only cost is metadata duplication. Data copying is deferred and amortized across subsequent writes.
Copy-on-write snapshots appear across many systems: Linux fork(), ZFS snapshots, Docker image layers, and database MVCC. If asked about efficient snapshots in an interview, explain the three-step pattern: (1) share the original data, (2) track references, (3) copy only on mutation. GFS adds the lease-revocation step to prevent writes during snapshot creation.