Garbage Collection
What happens when you delete a file in GFS? Nothing -- at least not immediately. GFS uses lazy deletion, a design choice that trades immediate space reclamation for simplicity, reliability, and safety.
How lazy deletion works
When a client deletes a file, GFS does two things and stops:
| Step | Action |
|---|---|
| 1 | The master logs the deletion to the operation log |
| 2 | The file is renamed to a hidden name that includes a deletion timestamp |
The file still exists. It can be read under its hidden name and undeleted by renaming it back. This provides a recovery window for accidental deletions.
Actual physical reclamation happens later, in three stages:
- File cleanup: During regular namespace scans, the master removes hidden files older than three days (configurable) and deletes their in-memory metadata
- Chunk metadata cleanup: During chunk namespace scans, the master deletes metadata for orphaned chunks (chunks not referenced by any file)
- ChunkServer cleanup: During regular HeartBeat exchanges, each ChunkServer reports a subset of its chunks. The master replies with a list of chunks that are no longer in its database. The ChunkServer then deletes those chunks from local disk.
Advantages of lazy deletion
| Advantage | Explanation |
|---|---|
| Reliability | If a chunk deletion message is lost, no retry is needed -- the next HeartBeat exchange catches it |
| Batching | Storage reclamation merges into existing background activities (namespace scans, HeartBeat exchanges), amortizing the cost |
| Off-peak execution | Garbage collection runs when the master is relatively idle |
| Accidental deletion safety | The three-day window lets users recover mistakenly deleted files |
Lazy deletion is a pattern that appears across many systems. Cassandra uses tombstones and compaction to defer deletion. Cloud storage services (S3, GCS) offer versioning and soft-delete for similar safety. In an interview, framing deletion as a two-phase process (mark, then sweep) shows you understand the reliability benefits of deferred cleanup.
Disadvantages of lazy deletion
Deleted files continue to consume storage for up to three days. Applications that rapidly create and delete temporary files cannot reuse that space immediately.
GFS offers workarounds:
- Expedited reclamation: Deleting an already-deleted file (re-issuing delete on the hidden name) triggers immediate cleanup
- No-replication directories: Users can designate directories where files are stored without replication, reducing waste
- Immediate deletion directories: Users can specify directories where deletion bypasses the lazy phase entirely
The three-day default delay means cluster capacity planning must account for "dead but not yet collected" data. In a system with high file churn, this overhead can be significant. Always factor garbage collection lag into storage capacity estimates.