Capgemini Hadoop Developer and Hadoop Admin interview Questions | Big Data | Hadoop

In this article, we will explain Capgemini Hadoop and Hadoop admin interview questions for experience in Big Data environment.




1. Tell me about you’re self and working experience?

2.Explain current project? and workflow?

3.Differnce between Managed tables and External tables in Hive?

4.Why we need Impala service? difference between Impala and Hive?

5.Do you have any idea on performance tuning in Hive? if yes, please explain me?

6.How to create external table in Hive? please explain me syntax?

7.What is Spark? explain Spark architecture?

8. Explain Spark driver and executors with example?

9.Do you have idea on Sqoop? if yes, please explain the one scenario question?

Scenario Question: I have around 100 table, I want to import all the tables from the database except Table 99 and Table 50. Then how can I import without having to import the tables one by one?




10.What is Sqoop metastore? please explain with one example?

11. Explain dynamic partition properties for Hive query? why we need dynamic partition?

12. How many mappers & reducers in MapReduce program by default?

13. For example : In you’re cluster 10 TB is the disk space per nodeĀ  (available) 15 disk nodes with 1 TB, 3 disks for OS. Just assuming that each node data size is 500 TB. Please explain how will you estimate the number of data nodes in the cluster?

Hadoop Admin interview Questions:

1. Tell me about you’re self? explain you’re current project andĀ  roles & responsibilities?

2.Brief idea about you’re current project cluster size, daily data and ram size?




3.Explain yarn queues & architecture?

4.Which security mechanism you’ve used? please explain me?

5.What is Kerberos? explain keytabs, and realms in you’re project?

6.What is Knox? which purpose Knox use in cluster?

7. What is corrupted file system? how to resolve the issue?

8. Explain you’re recent issue? how to resolve that issue with step by step process?

9. What is Namenode HA (High Availability) ? explain how to setup Namenode HA?

10. Explain Quorum, Journal, ZKFC (Zookeeper Fail Over Controller) nodes?