Summary: HDFS
The big picture
HDFS is the open-source world's answer to GFS. It takes GFS's core architecture (single master, distributed storage nodes, large block sizes, replication) and adapts it for the Hadoop ecosystem, where the primary workload is MapReduce-style batch processing.
HDFS made a key simplifying trade-off compared to GFS: no concurrent writers. GFS supported multiple clients appending to the same file simultaneously (a complex feature). HDFS restricts to single-writer, append-only access. This dramatically simplified the consistency model and implementation, at the cost of limiting some use cases.
Understanding HDFS alongside GFS reveals what's essential in the distributed file system design (master/worker split, large blocks, replication) versus what's workload-specific (concurrent appends, relaxed consistency).
Architecture at a glance
| Component | Role | GFS equivalent |
|---|---|---|
| NameNode | Single master; manages all metadata in memory | Master |
| DataNodes | Store 128MB blocks as local files | ChunkServers |
| Clients | Talk to NameNode for metadata, directly to DataNodes for data | Clients |
| EditLog | Write-ahead log for metadata changes | Operation log |
| FsImage | Periodic checkpoint of NameNode state | Checkpoint |
How HDFS uses system design patterns
| Problem | Pattern | How HDFS uses it |
|---|---|---|
| Surviving NameNode crashes | Write-ahead Log | All metadata changes written to EditLog before being applied |
| Monitoring DataNodes | Heartbeat | DataNodes send heartbeats every few seconds; missed heartbeats trigger re-replication |
| Detecting data corruption | Checksum | Checksums computed per 512 bytes; verified on reads, corrupt blocks re-replicated |
| Coordinating reads/writes | Leader and Follower | NameNode (leader) directs all metadata operations; DataNodes (followers) store data |
| Preventing split-brain | Split-brain + Fencing | ZooKeeper ensures one active NameNode; STONITH fences the old one |
| Write access control | Lease | Clients hold leases for file write access; expired leases free the file |
HDFS vs. GFS
| Aspect | GFS | HDFS |
|---|---|---|
| Block/chunk size | 64MB | 128MB |
| Concurrent appends | Yes (multiple writers) | No (single writer) |
| Random writes | Limited | None (append-only) |
| Consistency | Relaxed | Strong (single-writer simplification) |
| Master HA | Log replication + manual failover | ZooKeeper-based automatic failover |
| Open source | No | Yes |
The NameNode problem (and the solution)
HDFS's biggest limitation is the single NameNode: all metadata lives in one node's memory, creating both a scalability ceiling (limited by RAM) and a single point of failure.
HDFS High Availability solves the failure problem:
- An Active NameNode and a Standby NameNode run simultaneously
- Both share access to the EditLog (via shared NFS or a quorum journal)
- If the active fails, the standby takes over automatically
- Fencing (STONITH) ensures the old active can't corrupt shared state
The scalability ceiling (number of files limited by NameNode memory) remains a fundamental limitation of HDFS's design. HDFS Federation partially addresses this by allowing multiple independent NameNodes, each managing a portion of the namespace.
Quick reference card
| Property | Value |
|---|---|
| Type | Distributed file system |
| CAP classification | CP -- strongly consistent |
| Architecture | Single NameNode + multiple DataNodes |
| Block size | 128MB (configurable) |
| Replication | 3 replicas per block (configurable), rack-aware placement |
| Consistency | Strong; write-once, read-many; single writer per file |
| Write model | Append-only, no random writes |
| Metadata | In-memory on NameNode, persisted via EditLog + FsImage |
| Failure recovery | ZooKeeper-based HA with automatic failover |
| Open source | Yes (Apache Hadoop) |