Apache SPARK Interview Questions List
- Why RDD resilient?
- Difference between Persist and Cache?
- Difference between Lineage and DAG?
- What are narrow and wide transformations?
- What are Shared variables and it uses?
- How to define custom accumulator?
- If we have 50 GB memory and 100 GB data, how spark will process it?
- How to create UDFs in Spark?
- How to use hive UDFs in Spark?
- What are accumulators and broadcast variables?
- How to decide various parameter values in Spark – Submit?
- Difference between Coalesce and Repartition?
- Difference between RDD DATA FRAME and DATA SET. When to use one?
- What is Data Skew and how to fix it?
- Why shouldn’t we use group by the transformation in Spark?
- How to do Map side join in Spark?
1. What Challenges are faced in the Spark Project?
2.Use of map, flat map, map partition, for each partition?
3. What is Pair RDD? When to use them?
4.Performance optimization techniques in Spark?
5.Difference between Cluster and Client mode?
6. How to capture log in client mode and Cluster mode?
7. What happens if a worker node is dead?
8. What types of file format does Spark support? Which of them are most suitable for our organization needs?
Basic Spark Developer Interview Questions:
1.Difference between reduceByKey() and groupByKey()?
2.Difference between Spark 1 and Spark 2?
3. How do you debug Spark jobs?
4.Difference between Var and Val?
5. What size of file do you use for development?
6. How long will take to run your script in production?
7. Perform joins using RDD’s?
8. How do run your job in Spark?
9. What is the difference between the Spark data frame and the data set?
10. How data sets are type safe?
11. What are sink processors?
12.Lazy evaluation in Spark and its benefits?
13. After Spark – Submit, Whats’s process runs behind of application?
14. How to decide no.of stages in Spark job?
Above questions are related to Spark developers for experienced and beginners.