Latest Hadoop Spark interview Questions
1. What is the difference between RDD, DATAFRAME, and DATASET? Explain where to use in your project?
2.Why should n’t use group by transformation in spark? can you please explain it.
3.What challenges you faced in your Spark project?
4.What is use of map, flatmap, map partition,foreach, foreach partition?
5.Can you please explain about performance techniques in Spark?
6.Give me brief idea about difference between cluster mode and client mode?
7.Explain about different file formats in Spark supports?
8.If job got failed then how do you debug spark jobs? explain step by step process?
9.What is the size of file do you use for development in your recent project?
10.Which deployment tools you can use deployment of your Spark project?
11.How long will take to run your job scripts in production environment? explain it?
12. What are sink processors? explain it briefly?
13.How to decide number of stages in a spark job? explain it with step by step process?
14.What is command to transfer data between two cluster?
15. How to run hive script in hive cli?
16.If you have two columns on table insert into table but one of the column has the data type and another has good data type how data will be loaded? explain with scenario?
17.Explain Hive UDF’s with examples?
18. How you do incremental import for updating tables in your project? explain?
19.How do you get large data files from your client directly? explain flow of your project?
20. What are broadcast variables in Spark?
21.How do you achieve broadcast join automatically without doing it manually? and how do you setup your driver program to detect where broadcast join can be good to use and how do you automate process?
22.Suppose if you have an RDD is deleted in Spark job how do we can recover it and what is the backed mechanism of recovering RDDs?
23. If you have received an error “space out ” in your data node ? kindly explain how to resolve it?
24. How do you allocate buffer memory to your data node and explain it.?