High-level Architecture

How do you store a 10TB file on commodity machines with 1TB disks? You split it into fixed-size blocks, scatter those blocks across a cluster, and keep a single node responsible for tracking where everything lives. That is HDFS in one sentence.

Think first

HDFS was inspired by GFS. If you were building an open-source version of GFS, what would you keep the same and what might you simplify? Think about the concurrency model -- do most Hadoop workloads need concurrent writers to the same file?

HDFS architecture

Every file in HDFS is split into fixed-size blocks (128MB by default, configurable per file). The system stores two things for each file: the actual data (blocks on DataNodes) and the metadata (block locations, file size, replication info on the NameNode).

Component	Role
NameNode	Single master that manages all file system metadata
DataNodes	Worker nodes that store actual data blocks as local files
HDFS Client	Application-facing library; talks to NameNode for metadata, directly to DataNodes for data

Key design details:

All blocks of a file are the same size except the last one.
HDFS uses large block sizes because it targets extremely large files processed by MapReduce jobs -- large blocks reduce metadata overhead and improve sequential throughput.
Each block is identified by a unique 64-bit BlockID.
All read/write operations operate at the block level.
DataNodes store each block as a separate local file and serve read/write requests.
On startup, each DataNode scans its local file system and sends a BlockReport (list of hosted blocks) to the NameNode.
The NameNode persists file system state using two on-disk structures: FsImage (a checkpoint of metadata at a point in time) and EditLog (a write-ahead log of all metadata changes since the last checkpoint). Together they enable crash recovery.
The client never sends data through the NameNode -- all data transfers happen directly between the client and DataNodes.
HDFS replicates every block to multiple DataNodes (default: 3) for high availability.

Interview angle

When asked "how would you design a distributed file system," start with this architecture: a single metadata server (NameNode) plus many storage servers (DataNodes). Then immediately call out the trade-off -- the NameNode is a single point of failure and a scalability bottleneck (all metadata must fit in its memory). This shows you understand the design's strengths and limits.

warning

The NameNode holds all metadata in memory. Every file, directory, and block consumes approximately 150 bytes. At billions of small files, the NameNode runs out of RAM -- this is a fundamental scalability ceiling, not a bug.

Think first

HDFS uses 128MB blocks (double GFS's 64MB chunks). Both systems also use a single metadata server. Knowing what you've learned about GFS's limitations, what problems do you expect HDFS to share with GFS, and which ones might it avoid by making different design choices?

Comparison between GFS and HDFS

HDFS's architecture mirrors GFS closely, but the terminology and some design choices differ:

Feature	GFS	HDFS
Full Name	Google File System	Hadoop Distributed File System
Storage Node	ChunkServer	DataNode
File Unit	Chunk	Block
Default Size	64 MB, adjustable	128 MB, adjustable
Metadata Checkpoint	Checkpoint image	FsImage
Write Ahead Log	Operation log	EditLog
Platform	Linux	Cross platform
Language	C++	Java
Availability	Internal to Google	Open source
Monitoring	Master receives HeartBeat from ChunkServers	NameNode receives HeartBeat from DataNodes
Concurrency Model	Multiple writers, multiple readers	Write once, read many. No multiple writers
File Operations	Append and random writes supported	Append only
Garbage Collection	Deleted files renamed to special folder for later GC	Deleted files renamed to hidden name for later GC
Master Communication	RPC over TCP	RPC over TCP
Data Transfer	Pipelining and streaming over TCP	Pipelining and streaming over TCP
Cache Management	Clients cache metadata. ChunkServers rely on Linux buffer cache	Distributed cache. Off heap block cache in DataNode
Replication Strategy	Replicas spread across racks. Master auto re replicates when replicas fall below threshold	Rack aware replication. 2 copies in same rack, 1 in different rack by default
Default Replication	User configurable	3 by default, configurable
Namespace	Hierarchical directory structure	Hierarchical directory structure. Supports S3 and Cloud Store
Database Using It	Bigtable	HBase

Quiz

HDFS uses a write-once, read-many model while GFS supports concurrent writers. What would happen if HDFS allowed multiple concurrent writers to the same file?

Write throughput would increase because more clients can write simultaneously

HDFS would need a much more complex consistency model (like GFS's relaxed consistency), lease management for primaries and secondaries, and applications would need to handle duplicate and inconsistent records

The NameNode would become a bottleneck because it would need to coordinate all writers

DataNodes would need more disk space to store conflicting writes

HDFS architecture​

Comparison between GFS and HDFS​

HDFS architecture

Comparison between GFS and HDFS