What is Cluster Maintenance in Hadoop | Namenode | DataNode | HDFS




What is Cluster Maintenance in Hadoop:

Hadoop cluster requires an average amount of day to daycare and supply material in order to endure healthily and in optimal working conditions.
Maintenance tasks are usually performed in response to events:

A.Expanding the cluster

B.Dealing with failure or errant jobs

C.Managing logs

D.Upgrading software in a production environment

Managing Hadoop Process:

Starting and stopping init scripts:

Reasons might be approved configuration changes, upgrades, commissioning or decommissioning worker node.

Starting a Namenode:

Will bring it into service after it loads the FSimage, replays the transaction log, sees minimally replicated blocks from the data node

dfs.namenode.safemode.extension – Determines extension of safe mode in milliseconds after the threshold level is reached

The data node daemon will connect to its configured Namenode upon start and instantly join the cluster.

Once the Namenode has registered the data node, following reading and writing operations may be using it right away.

Stopping a Namenode:

Stopping or restarting a Namenode will provide HDFS (Hadoop Distributed File System) inaccessible unless operating in a highly available pair.




Datanodes can be safely stopped without interrupting HDFS services although replication of their block data will occur, which creates the load on the network.

Stopping a task tracker results in any currently executing child tasks being killed.

Any affected jobs will appear to slow down but will not fail unless the tasks in question were the final attempt to be made prior to failing the job.

1.Become user root (use sudo)

2. Execute the /etc/init.d/script an operation where the script is one of the daemons init scripts and operation is one of start, stop or restart.

3. Confirm that the process started or stopped by checking its log files or looking for the process in the output of ps -ef | grep process.

HDFS Maintenance:

  • Adding a Data node
  • Decommissioning  a Datanode
  • Balancing HDFs Block data
  • Dealing with a Failed disk