1. Hadoop – Scenario :
If you working on Hadoop Cluster and you have already cache the RDD and got the output stored in cache now I want to clear the memory space and use that space for caching another RDD? How to achieve this?
2. Spark – Scenario :
I) Suppose you are running 100 SQL jobs which generally take 50 mins to complete, but one it took 5 hour to complete.
Q 1) In this case How do you report this errors?
Q 2)How do you debug to code and provide a proper solution for this scenario.
Rare interview questions on Hadoop Eco – System:
1.What do you about type safety and which frame work has type safety in Hadoop?
2.What are the serializations in Hive? why do you choose that serialization explain in detail?
3. What modules you have worked in Scala and name the module and explain briefly?
4.What are the packages you have worked in Scala and name the package you have imported in your current project ?
5.What is the difference between map and map partition with clear explanation with real time example in Scala.
6. How do you connect to your cluster using data nodes or edge nodes?
7. How do you allocate buffer memory to your datanode?
8.How much buffer space have you allocated to your map task and reduce task in your data node
9. How do you achieve broadcast join automatically without out doing it manually? and how do you setup your driver program to detect where broadcast join can be good to use and how do you automate the process?