Namenode Vs Datanode: Responsibilities and Failure Handling

Anamika PatelAnamika Patel
3 min read

When you copy data to HDFS (Hadoop Distributed File System), it is split into fixed-size blocks (default: 128MB or 256MB) and stored across DataNodes. The NameNode does not store the data itself — instead, it stores the metadata: information about where each block is stored.

Responsibilities

Namenode

  • Stores metadata (filenames, block locations, permissions, timestamps)

  • Knows which DataNode holds which block.

  • Handles client requests for reading/writing files

  • Coordinates replication, block placement, and cluster health

Datanode

  • Stores actual data blocks

  • Sends heartbeats and block reports to the NameNode

  • Reads/writes blocks on behalf of the client or NameNode

  • Replicates blocks to other nodes when instructed.

Example:

Let’s say you have 30 nodes of 1 TB each and a replication factor of 3. Even though the total raw capacity is 30TB, each file block is stored three times, so the effective storage is:

  • 30 TB/3 = 10 TB usable HDFS space

So, yes we can upload 10TB of data to HDFS even though each node only has 1 TB.

Failure Handling

Datanode Failure

  • NameNode detects failure via missing heartbeats

  • It triggers replication of lost blocks to maintain the replication factor

  • Blocks are usually replicated to different racks for fault tolerance

Namenode Failure

  • This is critical, as it holds the only metadata

  • If the NameNode fails, the cluster becomes non-functional

  • To Solve this, Hadoop introduces :

High Availability (HA)

In Hadoop 2+, we use HA architecture:

  • One Active NameNode (serves all requests)

  • One Standby NameNode (keeps updated copy of metadata via shared edits)

  • They sync data continuously

  • If active node fails, standby takes over with minimal downtime

Note: The older Secondary NameNode is often misunderstood — it is not a backup. It just merges the NameNode’s edit logs and fsimage to prevent log bloat.

Time Complexity & Data Placement Strategy

  • HDFS is designed for write-once, read-many workloads

  • Writing to multiple racks increases write time, but enhances read efficiency

  • When reading, data is fetched from the nearest DataNode to reduce latency

  • Trade-offs are made between write latency and read badnwidth opitmization

Real-World Analogy

Think of the NameNode as a librarian who does not store the books but keeps a detailed catalog of where every book (block) is located on different bookshelves(DataNodes).

If one bookshelf collapses (DataNode fails), the librarian knows where other copies of the book are. But if the librarian knows where other copies of the book are.But if the librarian disappears (NameNode failure), nobody knows where anything is — unless there’s second librarian (Standby NameNode) ready to take over.

Summary

  • NameNode stores metadata; Datanode stores actual data

  • Data is split into blocks and replicated

  • NameNode failure is critical —> use High Availability (HA)

  • HDFS prefers write-once, read-many pattern

  • Data is stored across racks to balance fault tolerance and performance

0
Subscribe to my newsletter

Read articles from Anamika Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anamika Patel
Anamika Patel

I'm a Software Engineer with 3 years of experience building scalable web apps using React.js, Redux, and MUI. At Philips, I contributed to healthcare platforms involving DICOM images, scanner integration, and real-time protocol management. I've also worked on Java backends and am currently exploring Data Engineering and AI/ML with tools like Hadoop, MapReduce, and Python.