Replication Factor in Hadoop





How to Replication Factor comes into the picture:

The Backup mechanism in the traditional distribution system:

In Hadoop, Backup mechanism didn’t provide high availability. This system is followed by shaded architecture.

The first request from File to Master node then divided into blocksize. It is a continuous process but node 1(slave1) is failed to another node(Slave 2).

Replication Factor:

Replication factor is the process of duplicating the data on the different slave machines to achieve high availability processing.

Replication is a Backup mechanism or Failover mechanism or Fault tolerant mechanism.

In Hadoop, Replication factor default is 3 times. No need to configure.

Hadoop 1.x :
Replication Factor is 3
Hadoop 2.x:
Replication Factor is also 3.

In Hadoop, Minimum Replication factor is 1 time. It is possible for a single node Hadoop cluster.

In Hadoop, Maximum Replication factor is 512 times.

If 3 minimum replication factor then minimum 3 slave nodes are required.

If the replication factor is 10 then we need 10 slave nodes are required.

Here is simple for the replication factor:

'N' Replication Factor = 'N' Slave Nodes

Note: If the configured replication factor is 3 times but using 2 slave machines than actual replication factor is also 2 times.

How to configure Replication in Hadoop?

It is configured in the  hdfs-site.xml file.

/usr/local/hadoop/conf/hdfs-site.xml
<configuration>
<property>
<name> dfs.replication</name>
<value> 5 </value>
</property>
</value>

Design Rules Of Replication In Hadoop:

1. In Hadoop Replication is only applicable to Hadoop Distributed File System (HDFS) but not for Metadata.

2. Keep One Replication per slave node as per design.

3. Replication will only happen on Hadoop slave nodes alone but not on Hadoop Master node (because the master node is only for metadata management on its own. It will not maintain the data).

Storage only duplicates in Hadoop but not processing because processing us always unique.

Summary: In Hadoop, Replication factor is a major role for data backup mechanism in earlier days. Default replication factor always 3 except single node cluster environment.