Most frequently Asked Hadoop Admin interview Questions for Experienced

Hadoop Admin interview Questions for Experienced

1.Difference between Missing and Corrupt blocks in Hadoop 2.0 and how to handle it?

Missing block: Missing block means that there are blocks with no replicas anywhere in the cluster.

Corrupt block: It means that HDFS cannot find any replica containing data and replicas are all corrupted.

How to Handle :
By using  below command will handle

To find out which file is corrupted and remove a file

A) hdfs fsck /
B)hdfs fsck / | grep -v '^\.+$' | grep -v eplica
C) hdfs fsck /path/to/corrupt/file -location -block -files
D)hdfs fs -rm /path/to/file/

2. What is the reason behind of an odd number of zookeepers count?

Because Zookeeper elects a master based opinion of more than half of nodes from the cluster. If even number of zookeepers is there difficult to elects master so zookeepers count should be an odd number.

3. Why Kafka is required for zookeeper?

Apache Kafka uses zookeeper, need to first start zookeeper server. Zookeeper elects the controller topic configuration.

4. What is the retention period of Kafka logs?

When a message sent to Kafka cluster appended to the end of logs. The message remains on the topic for a configurable period of time. In this period of time Kafka generates a log file, it called retention period of Kafka log.
It defines log.retention.hours 

5. What is block size in your cluster, why not recommended for 54 MB block?

Depends upon your cluster, because of Hadoop standard is 64 MB

6. For suppose if the file is 270 MB then block size is 128 MB on your cluster so how many blocks if 3 blocks are 128+!28+14MB so 3rd block 14MB is wasted or other data can be appended?

7. What are the FS image and Edit logs?

FS image: In a Hadoop cluster the entire file system namespace, file system properties and block of files are stored into one image it is called an FS image (File System image). And total information in Editlogs.

8. What is your action plan if your PostgreSQL or MySQL down on your cluster?

First, check with the log file, then go with what is an error and find out the solution

For example: If connectionBad Postgres SQL
Solution: First status Postgres SQL service

sudo systemctl status postgressql

Then stop the Postgres SQL service

sudo systemctl stop postgressql

Then provide pg_ctlcluster with the right user and permissions

sudo systemctl enable postgressql

9. If both name nodes are in stand by name node, then if jobs are running or failed?

10. What is the Ambari port number?

By Default Ambari port number is 8080 for access to Ambari web and the REST API.

11. Is your Kerberos cluster which one using LDAP or Active Directory?

Depends upon your project if LDAP integration or Active Directory and explain it.

Most frequently asked Interview questions for experienced

In this era  in between 2-8 years experienced persons interviewer asked this type of questions in interview panel related to Big data and analytics and specially in Hadoop eco-system.
Mostly on Hands on experience in Hadoop and related to Project.

1. what properties you changed in Hadoop configuration files for your project?
Can you explain about your project related
2. where do you know Name Node and Datanode directory paths?
3. How do you handle incremental load in your project?
By using SQOOP incremental
4. can you do dynamic hive partitions through Sqoop?
Yes, dynamic partitions hive through SQOOP.
5. in which scenarios will we use Parquet and Avro?
It is based upon client and can you explore on it.
6. how do you handle Authentication and Authorization in your project?
Can you explain whether using Kerbreos and AD/LDAP. It is purely depends upon your project related.
7. How to Handle if Spark all jobs are failed?

Top 10 Hadoop Interview Questions

1.What exactly meaning of Hadoop?

Hadoop is a framework to Process and Store a huge amount of data. It is an open source software framework for distributed file system

2. Why do we need Hadoop in IT?



C.Data Quality

D.High Availability

E.Hardware Commodity

3. Difference between Hadoop 2.x and Hadoop 3.x?

Hadoop 2 handles only a single Name Node to manage all Name Spaces.

Hadoop 3 has multiple Namenodes for multiple NameSpaces

Hadoop 2 has a lot more storage overhead than Hadoop3

Hadoop 2 not support GPUs but Hadoop 3 support GPUs.

4.Define Data Locality in Hadoop?

Sending the Logic near to the of HDFS.

5. How is Security achieved in Hadoop?

In Hadoop by using Kerberos Hadoop achieves more security

6. What are different modes in which Hadoop run?

A.Standalone mode


C.Fully Distributed

7. Explain about Safemode in Hadoop?

Safe mode in Hadoop is a maintenance state of Name Node. During which the Name Node doesn’t allow any modifications to the file system. During Safemode, HDFS cluster is in read-only and doesn’t replicate or delete blocks.


hadoop dfsadming -safemode get

hadoop dfsadming -safemode enter

hadoop dfsadming -safemode leave

8.What are the main components in Hadoop eco-system?

A)HDFS             -Hadoop Distributed File System

B)MapReduce  – Programming paradigm- based on Java

C)Pig                  – To process and analyse  the structured,semi-structured data

D)Hive              – To process and analyse structured data

E)HBASE        – NoSQL database

F)SQOOP       – Import/Export structured data

G)Oozie          -Scheduler

H)Zookeeper – Configuration

9.Explain the differencebetween Name Node,Check point ,Backup Node in Hadoop eco-system?

Name Node- HDFS that manages the metadata

Checkpoint Name Node- Directory structure as  Name Node, and creates checkpoints

Backup Node-It needs to save the current state in memory to an image file to create a new checkpoint.

10.Benefits of Hadoop?

A)Ability to handle big data

B)Commodity hardware and is open-source

C)Ability to handle multiple data types

Summary: Top 10 Basic Hadoop interview questions for Freshers and Experienced