HDFS Characteristics

Every distributed system makes trade-offs. HDFS optimizes for throughput on large files at the cost of latency, small-file efficiency, and POSIX compliance. Understanding these trade-offs tells you when HDFS is the right tool -- and when it is not.

Think first

HDFS optimizes for throughput on large files at the cost of latency and small-file efficiency. Why would storing millions of small files (each a few KB) be problematic in a system where the NameNode keeps all metadata in memory?

Security and permission

HDFS implements a POSIX-like permissions model:

Permission	Files	Directories
Read (`r`)	Required to read the file	Required to list directory contents
Write (`w`)	Required to write or append	Required to create or delete children
Execute (`x`)	Ignored (files cannot be executed on HDFS)	Required to access a child of the directory

Each file and directory has an owner, a group, and separate permission bits for the owner, group members, and all other users. HDFS also supports optional POSIX ACLs (Access Control Lists) for finer-grained rules targeting specific named users or groups.

HDFS federation

The NameNode keeps all metadata in memory. On clusters with hundreds of millions of files, memory becomes the scaling bottleneck. A single NameNode also becomes a performance bottleneck when it must serve all metadata requests.

HDFS Federation (introduced in Hadoop 2.x) addresses both problems by allowing multiple independent NameNodes, each managing a portion of the namespace:

Property	Detail
Independence	NameNodes operate independently -- no coordination required between them
Shared storage	All NameNodes share the same pool of DataNodes
Isolation	A NameNode failure affects only its namespace, not others
Client routing	Clients use client-side mount tables to map file paths to the correct NameNode

For example, one NameNode might manage /user while another handles /share.

Block Pool IDs

Multiple independent NameNodes could generate the same 64-bit BlockID. HDFS solves this with Block Pools: each namespace owns one or more block pools, identified by a unique Block Pool ID. The extended block ID is a tuple of (Block Pool ID, Block ID), guaranteeing uniqueness across the federated cluster.

Interview angle

Federation is the answer to "how does HDFS scale metadata beyond one machine's RAM?" Each NameNode manages a disjoint portion of the namespace, so total metadata capacity scales linearly with the number of NameNodes. This is a horizontal scaling solution for metadata, while the data layer (DataNodes) was already horizontally scalable.

Think first

3x replication provides simple fault tolerance but uses 3x the storage. If you have petabytes of archival data that is rarely read, is there a more storage-efficient way to provide the same level of fault tolerance? Think about how RAID systems protect against disk failures with less overhead than full mirroring.

Erasure coding

Default 3x replication imposes a 200% storage overhead. Erasure Coding (EC) achieves the same fault tolerance with roughly 50% overhead -- effectively doubling usable storage capacity.

Approach	Storage overhead	Fault tolerance	CPU cost
3x replication	200%	Tolerates 2 replica losses	Minimal
Erasure coding (e.g., RS-6-3)	~50%	Tolerates up to 3 fragment losses	Higher -- encoding/decoding requires computation

Under EC, data is broken into fragments, encoded with redundant parity pieces, and distributed across DataNodes. If fragments are lost (disk failure, corruption), the original data is reconstructed from the surviving fragments.

warning

Erasure coding trades CPU for storage. It works well for cold or archival data that is rarely read, but the computational overhead of encoding/decoding makes it less suitable for hot data with frequent access. HDFS lets you set EC policies at the directory level, so you can mix replication (for hot data) and EC (for cold data) in the same cluster.

HDFS in practice

HDFS is used across the Hadoop ecosystem -- Pig, Hive, HBase, Giraph, Spark -- and beyond (e.g., GraphLab).

Advantages

Strength	Why
High bandwidth	Large clusters sustain up to 1 TB/s of continuous writes
High reliability	Replication (or EC) across racks ensures availability even during disk and server failures
Low cost per byte	Runs on commodity hardware with software-managed redundancy -- no SAN or enterprise disk arrays needed
Horizontal scalability	Add DataNodes to a running cluster and rebalance without downtime

Disadvantages

Limitation	Why
Small file inefficiency	Each file consumes ~150 bytes of NameNode memory. Millions of small files exhaust the NameNode. Workaround: combine small files into sequence files (binary key-value containers where the file name is the key and the contents are the value)
POSIX non-compliance	Applications must use the HDFS client or Hadoop API. A FUSE driver exists but does not support writes after close
Write-once model	No concurrent writers, no random writes (append only). Latest versions support append, but in-place modification remains impossible
Latency	Optimized for throughput, not latency. For millisecond reads, use BigTable/HBase instead

Interview angle

When an interviewer asks "what are HDFS's limitations?", lead with the NameNode memory bottleneck (it caps the total number of files) and the write-once model (no random writes, no concurrent writers). These are the two constraints most likely to disqualify HDFS for a given use case. Then mention that HDFS Federation and erasure coding address the first and storage-cost concerns respectively.

In short, HDFS excels when storing a moderate number of very large files for batch processing. It struggles with billions of small files, low-latency access, and workloads requiring concurrent or random writes.

Quiz

A team wants to use HDFS to store 500 million log files, each 10KB in size. What is the most critical problem they will face?

DataNodes will run out of disk space because the 128MB block size wastes space on small files

The NameNode will run out of memory -- each file consumes ~150 bytes of metadata, so 500 million files would require ~75GB of NameNode RAM just for file metadata, far exceeding typical capacity

Read throughput will be too slow for small files

Replication factor of 3 is too high for small files

Security and permission​

HDFS federation​

Block Pool IDs​

Erasure coding​

HDFS in practice​

Advantages​

Disadvantages​