What is Heartbeat in Hadoop? How to resolve Heartbeat lost in Cloudera and Hortonworks

Heartbeat in Hadoop:





In Hadoop, eco-system Heartbeat is an in-between Namenode and Datanode communication. It is the signal that is sent by the Datanode to Namenode after a regular interval. If Datanode in HDFS does not send a heartbeat to Namenode around 10 minutes by default then Namenode considers the Datanode is not available.

The default heartbeat interval is 3 seconds. Put in dfs.heartbeat.interval in a hdfs-site.xml file in Hadoop installation directory.

What is Heartbeat lost:

In Hadoop eco-system, the Datanode does not send a heartbeat to Namenode around 10 minutes by default. So, in this case, Namenode considers a Datanode is unavailable it is known as “Heartbeat lost”.

How to resolve Heartbeat lost:

In Bigdata distribution environment will take Hortonworks (HDP)In Hortonworks:
1. In HDP check Amabari agents status whether it is running or not by using” ambari-agent status ”
2. If it is not running then check with log files for Ambari server and Ambari agent as well as in the directory of /var/log/ambari-server and /var/log/ambari-agent.

3. Follow the below steps:

A) Stop ambari-server
B) Stop ambari-agent service on all nodes
C) Start ambari-agent service on all nodes
D) Start ambari-server server

Cloudera:

1. First Check the Cloudera scm agent status whether it is running or not by using” sudo service cloudera-scm-agent status ”





2.check the agent log files in this directory in /var/log/cloudera-scm-agent/

2. Then follow the below commands with root user

sudo service cloudera-scm-agent status
sudo service cloudera-scm-agent stop
sudo service cloudera-scm-agent start

Summary: Hadoop is following Master, Slave architecture. The master node stores the metadata and slave nodes stores the actual data. So while sending data communication between Namenode and Datanode is called as a “Heartbeat”. If it fails simply called as a “Heartbeat lost” it means that Datanode is unavailable.  To find resolution steps for Bigdata distributions like Hortonworks (HDP) and Cloudera (CDH) with step by step process for this issue.

Hadoop job (YARN Staging) error while executing simple job

In a Hadoop eco-system, no.of jobs are executing in a fraction of time in that time. I am trying to execute the Hive job for Data validation in Hive server in Production server. While executing a Hive job in the hive command line I got this type of error.



at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
22:33:33 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

The above error belongs to a connection error in Datanode while executing the code. At the time Datanode not running properly. so find below resolution for this issue:

Stop all services:

stop-all.sh
start-all.sh

Here restart all services including Namenode, Secondary Namenode, DataNodes and remaining services like Hive, Spark,
etc.

If still showing this type of error then start the distributed file system.

start-dfs.sh

Check all the Hadoop Daemons like Name node, Secondary Name node, Datanode, Resource Manager and Node Manager, etc. By using below command

jps

And then check All node information by using “hadoop dfsadmin -report ” for the status of the Datanode whether it is running fine or not.

Above steps for Local, Pseudo distributed,  and standalone mode only in Hadoop eco-system.

For Cloudera, Hortonworks, MapR distributions are simply “Restart” DataNodes and Services like Hive, Spark, etc.




Summary: In Big Data environment we executing so many jobs like Hadoop/Spark/Hive for the result but some times showing above error. At the time we stuck but here the simple solution for the above error

Latest: Hadoop Admin Interview Questions for 3 to 15 years Experience

Nowadays, emerging one of the skill is Hadoop administration. Below questions is the middle-level interview type questions:





1. Explain your projects according to your resume and using different types of distributions?

2. Explain about High Availability in Name node?

3. Explain about Kerberos, Ranger, Knox with scenario based?

4. Asking about any Scripting language like Python, Shell scripting?

5.Difference between Namnode and CLDB(Container Location DataBase in MapR)

6. How many Zookeepers are used in your project? Why it is odd one only can you please explain?

7. How to resolve Herat beat issue and explain the processes for resolve?

8. Recently resolved an issue from Cluster like Hive, HBase Master and how to resolve them?

9. Difference between Cloudera, MapR, and Hortonworks with examples?

10. Why Secondary Namenode concept picture in the Hadoop? and explain?

11. Explain step by step processing of  Hortworks Installation? No need to explain about prerequisites?

Prerequisites for MapR Installation on CentOS

In Hadoop Eco-System we preferable mostly three Big data distributions:

1.Cloudera Distribution Hadoop

2.Horton Works Data Platform

3.MapR Distributions Platform




In Cloudera, Distribution Platform is a free version, express, and enterprise edition up to 60 days trial version.

Coming to Hortonworks Data Platform completely open source platform for production, developing and testing environment.

Then finally MapR distribution platform is a complete enterprise edition but in MapR 3 is free version is available with fewer features to compare to MapR 5 and MapR 7.

How to install MapR free version on Pseduo Cluster:

Before the install of MapR, we configured prerequisites as  below:

——-Prerequisites——–

1.Configure hostname like FQDN by using the setup command (mapr.hadoop.com) after that check your hostname using hostname -f

2. vi/etc/hosts

3.hostname < your Fully Qualified Domain>

4. vim/etc/selinux/config ===> SELinux = disabled

——-Disable Firewalls and IPTables——-

If you enable firewalls and iptables doesn’t allow some ports so we must and should disable it.

1.service iptables save

2.service iptables stop

3.chkconfig iptables off

4.service ip6table save

5.service ip6tables stop

6.chkconfig ip6tables off

—– Enable NTP service for machines —–

NTP is a Network Time Protocol is a networking protocol for time synchronization between computers and packet switched data.

1.yum -y install ntp ntpupdate ntp-doc

2.chkconfig ntpd on

3.vi /etc/ntp.conf

4.server 0.rhel.pool.ntp.org

5.server 1.rhel.pool.ntp.org

6.server 2.rhel.pool.ntp.org

7.ntpq -p

8.date ( All machines have the same date otherwise it will showing error)

—— Install some additional packages in Linux OS —-

Here will install JAVA 1.8 and Python

1.yum -y install java-1.8.0 -openjdk-devel

2.yum -y install python perl expect expectk

—- setup passwordless SSH On all nodes form master node ——

For passwordless authentication in between master and slave nodes

1.ssh-keygen -t rsa

2.cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

3.ssh-copy-id root@<FQDN1, FQDN2>

—–Additional Linux configuration or Transparent Huge Pages(THP)—-

1. echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

2.echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

3.sysctl vm.swapiness=10

set up EPEL repository for installing additional packages on the system

Here  EPEL repository for installing the additional packages in centos machine




1.Install -uvh the EPEL repository

2.wget http://http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release -6.8.norach.rpm

How to Setup Cloudera Multi Node Cluster Setup with Pictures

Cloudera Installation and Configure Multi Node Cluster



  1. Open Putty:

2. Type Your Machine IP address and then click on Open

3.Then Login as per Username & Password:

4. Type: vi/etc/hosts then add remaining hosts

5. Edit: vi/etc/sysconfig/network

6. Type: vi/etc/selinux/config

SELINUX =enforcing replaced with disabled

7.Type: setenforce 0

8.Type: yum install ntp ntpdate ntp-doc: Install ntp(Netowork Time Protocol)

9. After Installation ntp then check ntp configurations type: chkconfig ntpd on

10.Type: vi/etc/ntp.conf

11.Type : ntpq -p

12.Then start the service ntpd start

13.Then ntpq

14. Then rsa pub key generator ssh-keygen-t rsa in remaining machines

15. File save as id_rsa

16.cd /root/.ssh

17.ll -check whether id_rsa.pub is there or not

18.cat id_rsa.pub>authorized_keys

19.Type: scp authorized_keys root@machine@.localdomain:/root/.ssh

20.Then type : yum install openssl python perl
21. yum clean all
22.yum repolist

23. Then download Clouder Manager using below command

wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

24.chmod 700 cloudera-manager-installer.bin






25.Then type ./cloudera-manager-insatller.bin click on Next

 

26. After that Accept License

27. It will take automatically installing JDK

28. Automatically Installing Embedded Database

29. Cloudera manager server Installing

30. Installation Successfully

31.Click on “OK”

32. If you get any Error then you have disabled Firewalls and IP tables
33. Disabled firewall Type: systemctl disable firewalld

34. Disabled IPV6 Type: vi /etc/sysctl.conf

35. Browse your Machine IP:xxx.xxx.xx.xxx:7180

36.Login : Username: admin

Password: admin

37. Yes, I accept the UserLicense ” Terms and Conditions”

38. Select Cloudera Express “Free”

39.Then Search host machines using as per domain names

40. Select Repository

41. If you need any Proxy Settings then select and fill it. Don’t need leave it.

42.Click on Continue for Three machines cluster Installations. Is there any issue then choose Mozila FireFox .

43. Click on “Continue” check CDH version

44. 100% completed then click on “Continue”

45.After “Continue” then check Validations

46. Here mainly two validations are showing warnings then type below commands then Run Again

echo never &gt; /sys/kernel/mm/transparent_hugepage/enabled
echo never &gt; /sys/kernel/mm/transparent_hugepage/defrag
sysctl vm.swappiness=10

47. Click on “Finish”

48.It shown Version Summary

49.HDFS NameNode and ResourceManager must be different


50.Select “Core with Spark” then Continue

 




51. Click on “Test Connection”  when using embedded  Database

52.Successful Setup the Cluster.