Skip to main content

Garbage Collection

What happens when you delete a file in GFS? Nothing -- at least not immediately. GFS uses lazy deletion, a design choice that trades immediate space reclamation for simplicity, reliability, and safety.

Think first
When a client deletes a file, should GFS immediately free the disk space? What happens if the deletion message to a ChunkServer is lost?

How lazy deletion works

When a client deletes a file, GFS does two things and stops:

StepAction
1The master logs the deletion to the operation log
2The file is renamed to a hidden name that includes a deletion timestamp

The file still exists. It can be read under its hidden name and undeleted by renaming it back. This provides a recovery window for accidental deletions.

Actual physical reclamation happens later, in three stages:

  1. File cleanup: During regular namespace scans, the master removes hidden files older than three days (configurable) and deletes their in-memory metadata
  2. Chunk metadata cleanup: During chunk namespace scans, the master deletes metadata for orphaned chunks (chunks not referenced by any file)
  3. ChunkServer cleanup: During regular HeartBeat exchanges, each ChunkServer reports a subset of its chunks. The master replies with a list of chunks that are no longer in its database. The ChunkServer then deletes those chunks from local disk.

Advantages of lazy deletion

AdvantageExplanation
ReliabilityIf a chunk deletion message is lost, no retry is needed -- the next HeartBeat exchange catches it
BatchingStorage reclamation merges into existing background activities (namespace scans, HeartBeat exchanges), amortizing the cost
Off-peak executionGarbage collection runs when the master is relatively idle
Accidental deletion safetyThe three-day window lets users recover mistakenly deleted files
Interview angle

Lazy deletion is a pattern that appears across many systems. Cassandra uses tombstones and compaction to defer deletion. Cloud storage services (S3, GCS) offer versioning and soft-delete for similar safety. In an interview, framing deletion as a two-phase process (mark, then sweep) shows you understand the reliability benefits of deferred cleanup.

Disadvantages of lazy deletion

Deleted files continue to consume storage for up to three days. Applications that rapidly create and delete temporary files cannot reuse that space immediately.

GFS offers workarounds:

  • Expedited reclamation: Deleting an already-deleted file (re-issuing delete on the hidden name) triggers immediate cleanup
  • No-replication directories: Users can designate directories where files are stored without replication, reducing waste
  • Immediate deletion directories: Users can specify directories where deletion bypasses the lazy phase entirely
warning

The three-day default delay means cluster capacity planning must account for "dead but not yet collected" data. In a system with high file churn, this overhead can be significant. Always factor garbage collection lag into storage capacity estimates.

Quiz
What would happen if GFS used immediate (eager) deletion instead of lazy garbage collection?