HDFS Basics: Core of Hadoop Storage

The Hadoop Distributed File System (HDFS) is designed to handle huge volumes of data by distributing it across multiple machines in a cluster.

Data Splitting into Blocks

When a user uploads or copies a file into HDFS (e.g., using hdfs dfs - put ). the system doesn’t store the entire file as one unit. Instead file is split into fixed-size-blocks, usually 128 MB or 256 MB. This can be configured in hdfs-site.xml using dfs.blocksize. Eg 500 MB file will be split into : 128 MB, 128 MB,128 MB,116MB
Where are These Blocks Stored?

The blocks are stored on Datanodes (worker machines in the cluster).

The Namenode (master node) doesn’t store actual data - instead, it keeps track of:

* While file is broken int which blocks.

* Where each block is stored across the Datanodes
Replication of Data (for Fault Tolerance)

To make sure data is not lost if a node fails, HDFS automatically creates multiple copies of each block:

* Default replication factor:3

* This means each block is stored on 3 different Datanodes (can be changed per file or globally)

Example:

* Block A : stored on Datanodes 1,3 and 5

* Block B : stored on Datanodes 2,4,and 6

The replication follows the rack-awareness policy:

* Once copy on the local rack

* One on a different rack

* One on a third machine for extra safety
Why All This?

* If a Datanode fails, HDFS still has 2 other copies of each block.

* If replication drops(e.g., due to failure), HDFS re-replicates blocks automatically.
Summary Flow:

1. User copies a file to HDFS

2. HDFS splits the file into blocks(128 MB default)

3. Each block is stored across multiple Datanodes.

4. The Namenode keeps track of the block locations.

5. Blocks are replicated (usually 3 copies) across different machines for fault tolerance.

A Beginner’s Guide to HDFS: The Heart of Hadoop Storage

Data Splitting into Blocks

Where are These Blocks Stored?

Replication of Data (for Fault Tolerance)

Why All This?

Summary Flow:

Subscribe to my newsletter

Anamika Patel

Anamika Patel