Skip to main content

GFS Consistency Model and Snapshotting

Every distributed storage system must answer: what guarantees do clients get about the data they read? GFS's answer is deliberately relaxed -- it trades strong consistency for simplicity and throughput, pushing edge-case handling to applications.

Think first
Given that GFS prioritizes throughput and supports concurrent appends, would you expect it to offer strong or relaxed consistency for data mutations? Why?

GFS consistency model

Metadata consistency

Metadata operations (file creation, deletion, renaming) are strongly consistent. They are handled exclusively by the single master, serialized through namespace locking, and ordered by the master's operation log.

Data mutation consistency

Data mutations are where things get nuanced. The guarantees differ by operation type:

OperationConcurrent behaviorConsistency guarantee
Write (client-specified offset)Concurrent writes to the same region are not serializedUndefined -- region may contain mixed fragments from multiple clients
Record appendGFS picks the offset; each append is atomicAt-least-once, atomic -- data may appear more than once, but each copy is a contiguous byte sequence
Namespace operationsSerialized by master locksStrongly consistent

The key distinction: for writes, the client specifies the offset, so concurrent writes can collide. For appends, the system chooses the offset, so it can serialize them without coordination among clients.

warning

GFS does not guarantee that all replicas of a chunk are byte-wise identical. After failed mutations and retries, replicas may contain different data (padding, duplicates, or partial records in different positions). Applications must use checksums and record IDs to validate data on read. This is the price of relaxed consistency.

Interview angle

When discussing consistency models in interviews, use GFS as a concrete example of a system that intentionally weakens consistency for performance. Compare it with systems that make different tradeoffs:

  • HDFS follows a similar model (single-writer, append-only) but enforces stricter replica consistency
  • Dynamo and Cassandra use eventual consistency with conflict resolution
  • Chubby/ZooKeeper provide strong consistency via consensus, but for small amounts of metadata

The CAP theorem and PACELC theorem provide the theoretical framework for these tradeoffs.

Snapshotting

GFS supports snapshot -- an efficient copy of a file or directory subtree at a point in time. Snapshots enable cheap branching of large datasets for checkpointing or experimentation.

Copy-on-write design

Snapshots are initially zero-copy. No data is duplicated until someone tries to modify a snapshotted chunk. The mechanism:

  1. Revoke leases: The master revokes all outstanding leases on chunks covered by the snapshot and waits for them to expire
  2. Log the operation: The master writes the snapshot to the operation log for durability
  3. Duplicate metadata only: The master duplicates the metadata (namespace entries and file-to-chunk mappings) for the source tree. The new snapshot files point to the original chunks -- no data is copied
  4. Copy on first write: When a client later writes to one of these shared chunks, the master detects it via a reference count > 1. It then:
    • Instructs each ChunkServer holding the chunk to make a local copy (no network transfer needed)
    • Issues a lease for the new copy
    • The write proceeds on the new copy, leaving the original chunk untouched

This copy-on-write scheme means snapshots of multi-terabyte directory trees complete almost instantly -- the only cost is metadata duplication. Data copying is deferred and amortized across subsequent writes.

Interview angle

Copy-on-write snapshots appear across many systems: Linux fork(), ZFS snapshots, Docker image layers, and database MVCC. If asked about efficient snapshots in an interview, explain the three-step pattern: (1) share the original data, (2) track references, (3) copy only on mutation. GFS adds the lease-revocation step to prevent writes during snapshot creation.

Quiz
If GFS enforced strong consistency (all replicas byte-wise identical at all times) for data mutations, what would be the primary cost?