Google File System Introduction
Goal
Design a distributed file system to store huge files (terabytes and larger). The system should be scalable, reliable, and highly available.
Why GFS matters
In the early 2000s, Google was crawling the entire internet and building a search index over it. The data volumes were staggering -- petabytes of web pages, constantly being recrawled and reprocessed. No commercial file system could handle it, and buying a proprietary solution at this scale would have been astronomically expensive.
So Google did what Google does: they built their own. But GFS wasn't just a bigger file system. Google studied their actual workloads and made deliberate, unconventional design choices that violated traditional file system assumptions:
- Files are enormous (multi-GB), not small
- Reads are mostly large and sequential, not small and random
- Files are written once and appended to, rarely modified in place
- Hardware failure is the norm, not the exception -- when you have thousands of commodity machines, something is always broken
These observations led to a design that looks nothing like a traditional file system -- and that's exactly why it worked.
GFS is the gold standard case study for designing around your workload. In an interview, when you're asked to design a storage system, the first thing to ask is: "What does the access pattern look like?" GFS shows how radically different the design becomes when you optimize for large sequential I/O instead of small random access.
What is GFS?
GFS is a scalable distributed file system built by Google for large, data-intensive applications. It runs on thousands of commodity machines, tolerates frequent hardware failures, and delivers high aggregate throughput to large numbers of clients.
| Design principle | What it means | Why |
|---|---|---|
| Large chunk size (64MB) | Files are split into 64MB chunks, not 4KB blocks | Reduces metadata overhead; aligns with large sequential I/O pattern |
| Single master | One master node manages all metadata | Simplifies coordination; master doesn't handle data flow |
| Replication (3x) | Every chunk is stored on 3 different machines | Hardware failure is constant; 3 copies ensures durability |
| Append-optimized | Concurrent appends are a first-class operation | Google's workloads produce data streams, not random edits |
| Relaxed consistency | Some operations have weaker guarantees | Simplifies design; applications handle edge cases |
GFS use cases
- Web crawling and indexing -- GFS was originally built to store data from Google's web crawler and serve it to the indexing pipeline
- BigTable storage -- BigTable uses GFS as its underlying storage layer for log and data files
- Large-scale data processing -- Gmail, YouTube, and Google Earth all use GFS for bulk data storage
- MapReduce jobs -- GFS is the storage substrate for Google's MapReduce framework
APIs
GFS does not provide standard POSIX-like APIs -- another deliberate choice that freed the designers from legacy constraints. Instead, it exposes user-level APIs:
| Operation | Description |
|---|---|
create | Create a new file |
delete | Delete a file |
open | Open a file, return a handle |
close | Close a file handle |
read | Read data from a file at a given offset |
write | Write data to a file at a given offset |
Plus two special operations that reflect GFS's unique design priorities:
- Snapshot -- Efficiently copy a file or directory tree. Used for checkpointing and branching large datasets.
- Record Append -- Allows multiple clients to append data to the same file concurrently while guaranteeing atomicity. This is the operation GFS is most heavily optimized for -- it powers producer-consumer queues and multi-way merge results without requiring external locking.
POSIX compliance would have forced GFS to support operations (like random writes, hard links, file locking semantics) that Google's workloads don't need. By dropping POSIX, GFS gained the freedom to optimize purely for its actual access patterns. The lesson: don't pay for abstractions you won't use.
What's next
In the following chapters, we'll explore:
- GFS's high-level architecture -- master, chunkservers, and clients
- The single master design and why it works (despite being a potential bottleneck)
- How metadata is managed and kept in memory
- The anatomy of read, write, and append operations
- GFS's consistency model and its deliberate relaxations
- Fault tolerance -- what happens when things break