Most frequently Asked Hadoop Admin interview Questions for Experienced

Hadoop Admin interview Questions for Experienced

1.Difference between Missing and Corrupt blocks in Hadoop 2.0 and how to handle it?

Missing block: Missing block means that there are blocks with no replicas anywhere in the cluster.

Corrupt block: It means that HDFS cannot find any replica containing data and replicas are all corrupted.

How to Handle :
By using  below command will handle

To find out which file is corrupted and remove a file

A) hdfs fsck /
B)hdfs fsck / | grep -v '^\.+$' | grep -v eplica
C) hdfs fsck /path/to/corrupt/file -location -block -files
D)hdfs fs -rm /path/to/file/

2. What is the reason behind of an odd number of zookeepers count?

Because Zookeeper elects a master based opinion of more than half of nodes from the cluster. If even number of zookeepers is there difficult to elects master so zookeepers count should be an odd number.

3. Why Kafka is required for zookeeper?

Apache Kafka uses zookeeper, need to first start zookeeper server. Zookeeper elects the controller topic configuration.

4. What is the retention period of Kafka logs?

When a message sent to Kafka cluster appended to the end of logs. The message remains on the topic for a configurable period of time. In this period of time Kafka generates a log file, it called retention period of Kafka log.
It defines log.retention.hours 

5. What is block size in your cluster, why not recommended for 54 MB block?

Depends upon your cluster, because of Hadoop standard is 64 MB

6. For suppose if the file is 270 MB then block size is 128 MB on your cluster so how many blocks if 3 blocks are 128+!28+14MB so 3rd block 14MB is wasted or other data can be appended?

7. What are the FS image and Edit logs?

FS image: In a Hadoop cluster the entire file system namespace, file system properties and block of files are stored into one image it is called an FS image (File System image). And total information in Editlogs.

8. What is your action plan if your PostgreSQL or MySQL down on your cluster?

First, check with the log file, then go with what is an error and find out the solution

For example: If connectionBad Postgres SQL
Solution: First status Postgres SQL service

sudo systemctl status postgressql

Then stop the Postgres SQL service

sudo systemctl stop postgressql

Then provide pg_ctlcluster with the right user and permissions

sudo systemctl enable postgressql

9. If both name nodes are in stand by name node, then if jobs are running or failed?

10. What is the Ambari port number?

By Default Ambari port number is 8080 for access to Ambari web and the REST API.

11. Is your Kerberos cluster which one using LDAP or Active Directory?

Depends upon your project if LDAP integration or Active Directory and explain it.

Latest: Hadoop Admin Interview Questions for 3 to 15 years Experience

Nowadays, emerging one of the skill is Hadoop administration. Below questions is the middle-level interview type questions:

1. Explain your projects according to your resume and using different types of distributions?

2. Explain about High Availability in Name node?

3. Explain about Kerberos, Ranger, Knox with scenario based?

4. Asking about any Scripting language like Python, Shell scripting?

5.Difference between Namnode and CLDB(Container Location DataBase in MapR)

6. How many Zookeepers are used in your project? Why it is odd one only can you please explain?

7. How to resolve Herat beat issue and explain the processes for resolve?

8. Recently resolved an issue from Cluster like Hive, HBase Master and how to resolve them?

9. Difference between Cloudera, MapR, and Hortonworks with examples?

10. Why Secondary Namenode concept picture in the Hadoop? and explain?

11. Explain step by step processing of  Hortworks Installation? No need to explain about prerequisites?

Latest Hadoop Admin Interview Questions with Answers

LatestHadoop admin interview questions and answers:

1. What is Edge Node? Why choose two edge nodes in a cluster?

Basically, Edge Nodes are end-user connectivity purposes like an interface between cluster and client.

One Edge node is a single point if the edge node goes down another edge node will connect that’s why we use two edge nodes.

2. If you have four master nodes what are services are installed?

In master node 1: installed, Name node, Secondary node Hive server, Resource manager one zookeeper

In master node 2: HBase master, Oozie server

In master node 3: Hue, spark, three zookeeper

In master node 4: High availability

3. Tell me about default block size of Hadoop and  Unix?

The default block size of HDFS is 128MB

The default block size of Unix is 4kb

4. What are security measures that are implemented in the Hadoop cluster?

LDAP is the first level authentication

Kerberos for the second level authentication

Sentry for role-based authorization to data and metadata stored on Hadoop cluster

Knox, who access the cluster to provide security like a  gateways

Ranger is to provide security across Hadoop eco-system folder access and data authorization

5. What about data transmitted over the network data in transit how do you secure the data?

By using encrypted data transmitted over the networks and also using SSL certifications and HTTPS and some other protocols also.

6. What are the types of accounts used in the Hadoop cluster?

Service account: This account belongs to create in the active directory,  within the Hadoop cluster access the jobs and applications.

Technical account: This account related to access from outside clients for application related for example Java client to Hive access.

Business user account: This account belongs to some business users want to access the Hadoop cluster.

Admin account: highly privileged account for giving credentials for users from active directory

Local account: This account belongs to Unix based for active directory principals.

Hadoop Admin Roles and Responsibilities

Hadoop Admin Roles and Responsibilities:

Hadoop Administrator career is an excellent career and lot of growth opportunities because less amount of people and Hadoop is huge demand technology.

Hadoop Administrator is responsible for Hadoop Install and monitoring Cluster Management.

Roles and Responsibilities:

  1. Capacity Planning and Hardware requirement of the nodes, Network architecture and Planning.
  2. Hadoop Software Installation and configuration whether Cloudera Distribution or Horton Works distribution etc.
  3. Configuring Name Node, Data Nodes to ensure its high availability.
  4. Tuning of Hadoop Cluster and creating new users in Hadoop, handling permissions, performance upgrades.
  5. Hadoop Backup and Recovery tasks
  6. Every day finding out which jobs are taking more time, if users say that jobs are stuck to find out the reason.
  7. Health check of Hadoop cluster Monitoring
  8. Deployment in Hadoop Cluster and maintaining it.
  9. Support and maintenace of Hadoop Storage (HDFS)
  10. Security administration during installation and basic knowledge on Kerberos, Apache Knoz and Apache Ranger etc.
  11. Data migration between clusters if needed ex: using Falcon tool.
  12. Manage Hadoop Log files and analyzing failed jobs
  13. Troubleshoot Network and applications
  14. Knowledge on Scripting Skills on Linux environment
  15. Knowledge on Oozie, Hive , HCatalog and Hadoop Eco – System


Day to Day Activities of Hadoop Admin:

  1. Monitoring Console whether Cloudera Manager or Horton works and job tracker UI.
  2. HDFS Maintenance and Support
  3. Health check of Hadoop cluster monitoring
  4. Managing Hadoop log files and find out errors
  5. Managing users, permissions etc.
  6. Troubleshoot Network errors and application errors.

Skill sets required to become a Hadoop Administrator :

  1. Strong Knowledge on Linux/Unix
  2. Knowledge on Shell Scripting/Python Scripting
  3. Hands on Experience of Cluster Monitoring tools like Ambari, Gangila etc.
  4. Networking and Memory management

Summary: Hadoop Administration is one of the best careers in terms of growth and opportunities. Nowadays the Hadoop market is on rising. If you have knowledge on Linux and Database then admin it can be an advantage.