Hadoop Admin interview Questions for Experienced
1.Difference between Missing and Corrupt blocks in Hadoop 2.0 and how to handle it?
Missing block: Missing block means that there are blocks with no replicas anywhere in the cluster.
Corrupt block: It means that HDFS cannot find any replica containing data and replicas are all corrupted.
How to Handle :
By using below command will handle
To find out which file is corrupted and remove a file
A) hdfs fsck /
B)hdfs fsck / | grep -v '^\.+$' | grep -v eplica
C) hdfs fsck /path/to/corrupt/file -location -block -files
D)hdfs fs -rm /path/to/file/
2. What is the reason behind of an odd number of zookeepers count?
Because Zookeeper elects a master based opinion of more than half of nodes from the cluster. If even number of zookeepers is there difficult to elects master so zookeepers count should be an odd number.
3. Why Kafka is required for zookeeper?
Apache Kafka uses zookeeper, need to first start zookeeper server. Zookeeper elects the controller topic configuration.
4. What is the retention period of Kafka logs?
When a message sent to Kafka cluster appended to the end of logs. The message remains on the topic for a configurable period of time. In this period of time Kafka generates a log file, it called retention period of Kafka log.
It defines log.retention.hours
5. What is block size in your cluster, why not recommended for 54 MB block?
Depends upon your cluster, because of Hadoop standard is 64 MB
6. For suppose if the file is 270 MB then block size is 128 MB on your cluster so how many blocks if 3 blocks are 128+!28+14MB so 3rd block 14MB is wasted or other data can be appended?
7. What are the FS image and Edit logs?
FS image: In a Hadoop cluster the entire file system namespace, file system properties and block of files are stored into one image it is called an FS image (File System image). And total information in Editlogs.
8. What is your action plan if your PostgreSQL or MySQL down on your cluster?
First, check with the log file, then go with what is an error and find out the solution
For example: If connectionBad Postgres SQL
Solution: First status Postgres SQL service
sudo systemctl status postgressql
Then stop the Postgres SQL service
sudo systemctl stop postgressql
Then provide pg_ctlcluster with the right user and permissions
sudo systemctl enable postgressql
9. If both name nodes are in stand by name node, then if jobs are running or failed?
10. What is the Ambari port number?
By Default Ambari port number is 8080 for access to Ambari web and the REST API.
11. Is your Kerberos cluster which one using LDAP or Active Directory?
Depends upon your project if LDAP integration or Active Directory and explain it.