Adding a Datanode Dynamically and Decommissioning a Datanode in Hadoop

This article will explain how to adding a data node and decommissioning a data node in the Hadoop cluster.



Adding a Datanode in Dynamically in Hadoop cluster:

1. Add the IP address of the data node to the file specified by the dfs.hosts parameter. Each entry should be separated by a newline character.

2. Execute the below command as the HDFS superuser or a user with equivalent privileges.

hadoop dfsadmin -refresh nodes

3. If using rack awareness, update any rack information necessary for the new host.

4. Start the data node process

5. Check the Namenode web UI or the output of the “Hadoop dfsadmin -report ” to confirm that the new host is connected.

Steps 1 and 2 are required only if you are using the HDFS hosts include functionality.

Decommissioning a Datanode:

A data node may be decommissioned to remove it from the cluster gracefully while maintaining the replication factor of all blocks stored on the machine.

The process can be lengthy, depending on the amount of data on the host, the activity of the cluster, and the speed of the network.

To Decommission:

1. Add the IP address of the data node to the file specified by the dfs.hosts.exclude parameter. Each entry should be separated by a newline character.




2. Execute the below command as the HDFS superuser or a user with equivalent privileges.

hadoop dfsadmin -refreshNodes

3. Monitor the name node web UI and confirm the decommission process is in progress. It can take a few seconds to update.

4. For data nodes with a lot of data, go home for the night. Decommissioning can take hours or even days. When the process has completed. the name node UI will list the data node as decommissioned.

5. Stop the data node process.

6.If you do not plan to reintroduce the machine to the cluster, remove it from the HDFS include and exclude files as well as any rack topology database.

7. Execute the command “hadoop dfsadmin -refresh nodes” to have the name node pickup the removal.