GFS Consistency Model and Snapshotting

Every distributed storage system must answer: what guarantees do clients get about the data they read? GFS's answer is deliberately relaxed -- it trades strong consistency for simplicity and throughput, pushing edge-case handling to applications.

Think first

Given that GFS prioritizes throughput and supports concurrent appends, would you expect it to offer strong or relaxed consistency for data mutations? Why?

GFS consistency model

Metadata consistency

Metadata operations (file creation, deletion, renaming) are strongly consistent. They are handled exclusively by the single master, serialized through namespace locking, and ordered by the master's operation log.

Data mutation consistency

Data mutations are where things get nuanced. The guarantees differ by operation type:

Operation	Concurrent behavior	Consistency guarantee
Write (client-specified offset)	Concurrent writes to the same region are not serialized	Undefined -- region may contain mixed fragments from multiple clients
Record append	GFS picks the offset; each append is atomic	At-least-once, atomic -- data may appear more than once, but each copy is a contiguous byte sequence
Namespace operations	Serialized by master locks	Strongly consistent

The key distinction: for writes, the client specifies the offset, so concurrent writes can collide. For appends, the system chooses the offset, so it can serialize them without coordination among clients.

warning

GFS does not guarantee that all replicas of a chunk are byte-wise identical. After failed mutations and retries, replicas may contain different data (padding, duplicates, or partial records in different positions). Applications must use checksums and record IDs to validate data on read. This is the price of relaxed consistency.

Interview angle

When discussing consistency models in interviews, use GFS as a concrete example of a system that intentionally weakens consistency for performance. Compare it with systems that make different tradeoffs:

HDFS follows a similar model (single-writer, append-only) but enforces stricter replica consistency
Dynamo and Cassandra use eventual consistency with conflict resolution
Chubby/ZooKeeper provide strong consistency via consensus, but for small amounts of metadata

The CAP theorem and PACELC theorem provide the theoretical framework for these tradeoffs.

Snapshotting

GFS supports snapshot -- an efficient copy of a file or directory subtree at a point in time. Snapshots enable cheap branching of large datasets for checkpointing or experimentation.

Copy-on-write design

Snapshots are initially zero-copy. No data is duplicated until someone tries to modify a snapshotted chunk. The mechanism:

Revoke leases: The master revokes all outstanding leases on chunks covered by the snapshot and waits for them to expire
Log the operation: The master writes the snapshot to the operation log for durability
Duplicate metadata only: The master duplicates the metadata (namespace entries and file-to-chunk mappings) for the source tree. The new snapshot files point to the original chunks -- no data is copied
Copy on first write: When a client later writes to one of these shared chunks, the master detects it via a reference count > 1. It then:
- Instructs each ChunkServer holding the chunk to make a local copy (no network transfer needed)
- Issues a lease for the new copy
- The write proceeds on the new copy, leaving the original chunk untouched

This copy-on-write scheme means snapshots of multi-terabyte directory trees complete almost instantly -- the only cost is metadata duplication. Data copying is deferred and amortized across subsequent writes.

Interview angle

Copy-on-write snapshots appear across many systems: Linux fork(), ZFS snapshots, Docker image layers, and database MVCC. If asked about efficient snapshots in an interview, explain the three-step pattern: (1) share the original data, (2) track references, (3) copy only on mutation. GFS adds the lease-revocation step to prevent writes during snapshot creation.

Quiz

If GFS enforced strong consistency (all replicas byte-wise identical at all times) for data mutations, what would be the primary cost?

The master would need more memory to track consistency state

The system would need to roll back or block writes on partial failures, significantly reducing write throughput and availability

Clients would need to read from all replicas and compare results

Chunk size would need to be reduced to minimize the window of inconsistency

GFS consistency model​

Metadata consistency​

Data mutation consistency​

Snapshotting​

Copy-on-write design​

GFS consistency model

Metadata consistency

Data mutation consistency

Snapshotting

Copy-on-write design