Namenode Vs Datanode: Responsibilities and Failure Handling

When you copy data to HDFS (Hadoop Distributed File System), it is split into fixed-size blocks (default: 128MB or 256MB) and stored across DataNodes. The NameNode does not store the data itself — instead, it stores the metadata: information about where each block is stored.
Responsibilities
Namenode
Stores metadata (filenames, block locations, permissions, timestamps)
Knows which DataNode holds which block.
Handles client requests for reading/writing files
Coordinates replication, block placement, and cluster health
Datanode
Stores actual data blocks
Sends heartbeats and block reports to the NameNode
Reads/writes blocks on behalf of the client or NameNode
Replicates blocks to other nodes when instructed.
Example:
Let’s say you have 30 nodes of 1 TB each and a replication factor of 3. Even though the total raw capacity is 30TB, each file block is stored three times, so the effective storage is:
- 30 TB/3 = 10 TB usable HDFS space
So, yes we can upload 10TB of data to HDFS even though each node only has 1 TB.
Failure Handling
Datanode Failure
NameNode detects failure via missing heartbeats
It triggers replication of lost blocks to maintain the replication factor
Blocks are usually replicated to different racks for fault tolerance
Namenode Failure
This is critical, as it holds the only metadata
If the NameNode fails, the cluster becomes non-functional
To Solve this, Hadoop introduces :
High Availability (HA)
In Hadoop 2+, we use HA architecture:
One Active NameNode (serves all requests)
One Standby NameNode (keeps updated copy of metadata via shared edits)
They sync data continuously
If active node fails, standby takes over with minimal downtime
Note: The older Secondary NameNode is often misunderstood — it is not a backup. It just merges the NameNode’s edit logs and fsimage to prevent log bloat.
Time Complexity & Data Placement Strategy
HDFS is designed for write-once, read-many workloads
Writing to multiple racks increases write time, but enhances read efficiency
When reading, data is fetched from the nearest DataNode to reduce latency
Trade-offs are made between write latency and read badnwidth opitmization
Real-World Analogy
Think of the NameNode as a librarian who does not store the books but keeps a detailed catalog of where every book (block) is located on different bookshelves(DataNodes).
If one bookshelf collapses (DataNode fails), the librarian knows where other copies of the book are. But if the librarian knows where other copies of the book are.But if the librarian disappears (NameNode failure), nobody knows where anything is — unless there’s second librarian (Standby NameNode) ready to take over.
Summary
NameNode stores metadata; Datanode stores actual data
Data is split into blocks and replicated
NameNode failure is critical —> use High Availability (HA)
HDFS prefers write-once, read-many pattern
Data is stored across racks to balance fault tolerance and performance
Subscribe to my newsletter
Read articles from Anamika Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Anamika Patel
Anamika Patel
I'm a Software Engineer with 3 years of experience building scalable web apps using React.js, Redux, and MUI. At Philips, I contributed to healthcare platforms involving DICOM images, scanner integration, and real-time protocol management. I've also worked on Java backends and am currently exploring Data Engineering and AI/ML with tools like Hadoop, MapReduce, and Python.