Skip to main content

Summary: HDFS

The big picture

HDFS is the open-source world's answer to GFS. It takes GFS's core architecture (single master, distributed storage nodes, large block sizes, replication) and adapts it for the Hadoop ecosystem, where the primary workload is MapReduce-style batch processing.

HDFS made a key simplifying trade-off compared to GFS: no concurrent writers. GFS supported multiple clients appending to the same file simultaneously (a complex feature). HDFS restricts to single-writer, append-only access. This dramatically simplified the consistency model and implementation, at the cost of limiting some use cases.

Understanding HDFS alongside GFS reveals what's essential in the distributed file system design (master/worker split, large blocks, replication) versus what's workload-specific (concurrent appends, relaxed consistency).

Architecture at a glance

ComponentRoleGFS equivalent
NameNodeSingle master; manages all metadata in memoryMaster
DataNodesStore 128MB blocks as local filesChunkServers
ClientsTalk to NameNode for metadata, directly to DataNodes for dataClients
EditLogWrite-ahead log for metadata changesOperation log
FsImagePeriodic checkpoint of NameNode stateCheckpoint

How HDFS uses system design patterns

ProblemPatternHow HDFS uses it
Surviving NameNode crashesWrite-ahead LogAll metadata changes written to EditLog before being applied
Monitoring DataNodesHeartbeatDataNodes send heartbeats every few seconds; missed heartbeats trigger re-replication
Detecting data corruptionChecksumChecksums computed per 512 bytes; verified on reads, corrupt blocks re-replicated
Coordinating reads/writesLeader and FollowerNameNode (leader) directs all metadata operations; DataNodes (followers) store data
Preventing split-brainSplit-brain + FencingZooKeeper ensures one active NameNode; STONITH fences the old one
Write access controlLeaseClients hold leases for file write access; expired leases free the file

HDFS vs. GFS

AspectGFSHDFS
Block/chunk size64MB128MB
Concurrent appendsYes (multiple writers)No (single writer)
Random writesLimitedNone (append-only)
ConsistencyRelaxedStrong (single-writer simplification)
Master HALog replication + manual failoverZooKeeper-based automatic failover
Open sourceNoYes

The NameNode problem (and the solution)

HDFS's biggest limitation is the single NameNode: all metadata lives in one node's memory, creating both a scalability ceiling (limited by RAM) and a single point of failure.

HDFS High Availability solves the failure problem:

  • An Active NameNode and a Standby NameNode run simultaneously
  • Both share access to the EditLog (via shared NFS or a quorum journal)
  • If the active fails, the standby takes over automatically
  • Fencing (STONITH) ensures the old active can't corrupt shared state

The scalability ceiling (number of files limited by NameNode memory) remains a fundamental limitation of HDFS's design. HDFS Federation partially addresses this by allowing multiple independent NameNodes, each managing a portion of the namespace.

Quick reference card

PropertyValue
TypeDistributed file system
CAP classificationCP -- strongly consistent
ArchitectureSingle NameNode + multiple DataNodes
Block size128MB (configurable)
Replication3 replicas per block (configurable), rack-aware placement
ConsistencyStrong; write-once, read-many; single writer per file
Write modelAppend-only, no random writes
MetadataIn-memory on NameNode, persisted via EditLog + FsImage
Failure recoveryZooKeeper-based HA with automatic failover
Open sourceYes (Apache Hadoop)
Design Challenge

Design a data lake for a social media platform

You need to design a data lake that stores user activity logs (clicks, views, posts) for a social media platform with 200 million daily active users. Logs are ingested continuously and processed by daily MapReduce analytics jobs. The system must scale to petabytes and tolerate rack-level failures.
Hints (0/4)

References and further reading