Skip to main content

Google File System Introduction

Goal

Design a distributed file system to store huge files (terabytes and larger). The system should be scalable, reliable, and highly available.

Why GFS matters

In the early 2000s, Google was crawling the entire internet and building a search index over it. The data volumes were staggering -- petabytes of web pages, constantly being recrawled and reprocessed. No commercial file system could handle it, and buying a proprietary solution at this scale would have been astronomically expensive.

So Google did what Google does: they built their own. But GFS wasn't just a bigger file system. Google studied their actual workloads and made deliberate, unconventional design choices that violated traditional file system assumptions:

  • Files are enormous (multi-GB), not small
  • Reads are mostly large and sequential, not small and random
  • Files are written once and appended to, rarely modified in place
  • Hardware failure is the norm, not the exception -- when you have thousands of commodity machines, something is always broken

These observations led to a design that looks nothing like a traditional file system -- and that's exactly why it worked.

Interview insight

GFS is the gold standard case study for designing around your workload. In an interview, when you're asked to design a storage system, the first thing to ask is: "What does the access pattern look like?" GFS shows how radically different the design becomes when you optimize for large sequential I/O instead of small random access.

What is GFS?

GFS is a scalable distributed file system built by Google for large, data-intensive applications. It runs on thousands of commodity machines, tolerates frequent hardware failures, and delivers high aggregate throughput to large numbers of clients.

Design principleWhat it meansWhy
Large chunk size (64MB)Files are split into 64MB chunks, not 4KB blocksReduces metadata overhead; aligns with large sequential I/O pattern
Single masterOne master node manages all metadataSimplifies coordination; master doesn't handle data flow
Replication (3x)Every chunk is stored on 3 different machinesHardware failure is constant; 3 copies ensures durability
Append-optimizedConcurrent appends are a first-class operationGoogle's workloads produce data streams, not random edits
Relaxed consistencySome operations have weaker guaranteesSimplifies design; applications handle edge cases

GFS use cases

  • Web crawling and indexing -- GFS was originally built to store data from Google's web crawler and serve it to the indexing pipeline
  • BigTable storage -- BigTable uses GFS as its underlying storage layer for log and data files
  • Large-scale data processing -- Gmail, YouTube, and Google Earth all use GFS for bulk data storage
  • MapReduce jobs -- GFS is the storage substrate for Google's MapReduce framework

APIs

GFS does not provide standard POSIX-like APIs -- another deliberate choice that freed the designers from legacy constraints. Instead, it exposes user-level APIs:

OperationDescription
createCreate a new file
deleteDelete a file
openOpen a file, return a handle
closeClose a file handle
readRead data from a file at a given offset
writeWrite data to a file at a given offset

Plus two special operations that reflect GFS's unique design priorities:

  • Snapshot -- Efficiently copy a file or directory tree. Used for checkpointing and branching large datasets.
  • Record Append -- Allows multiple clients to append data to the same file concurrently while guaranteeing atomicity. This is the operation GFS is most heavily optimized for -- it powers producer-consumer queues and multi-way merge results without requiring external locking.
Why no POSIX?

POSIX compliance would have forced GFS to support operations (like random writes, hard links, file locking semantics) that Google's workloads don't need. By dropping POSIX, GFS gained the freedom to optimize purely for its actual access patterns. The lesson: don't pay for abstractions you won't use.

What's next

In the following chapters, we'll explore: