Cognization conducted Hadoop and Spark interview question for experienced persons.
1. What is the future class in Scala programming language?
2.Difference between fold by fold Left or foldRight-in Scala?
3. How to distribute by will work in hive give some data tell me how to data will be distributed
4.dF.filter(Id == 3000) how to pass this condition in data frame on values in dynamically?
5. Have you worked on multithreading in Scala and explain?
7.On what basis you will increase the mappers in Apache Sqoop?
8. What will you mention last value while you are importing for the first time in Sqoop?
9. How do you mention date for incremental last modified in Spark?
10. Let’s say you have created the partition for Bengaluru but you loaded Hyderabad data what is the validation we have to do in this case to make sure that there won’t be any errors?
11. How many reducers will be launched in distributed by in Spark?
12. How to delete sqoop job in simple command?
13.In which location sqoop job last value will be stored?
14. What are the default input and output formats in Hive?
15. Can you explain brief idea about distributing cache in Spark with an example?
16. Did you use Kafka/Flume in your project and explain in detail?
17.Difference between Parquet and ORC file formats?
1. Explain your previous project?
2. How do you handle incremental data in apache sqoop?
3. Which Optimization techniques are used in sqoop?
4. What are the different parameters you pass your spark job?
5. In case one task is taking more time how will you handle?
6. What is stages and task in spark and give a real-time scenario?
7.On what basis you set mappers in Sqoop?
8. How will you export the data to Oracle without putting much load in the table?
9. What is column family in Hbase?
10. Can you create a table without mentioning column family
11.The number of column families limits for one table?
12. How to schedule Spark jobs in your previous project?
13. Explain Spark architecture with a real-time based scenario?