Hard Interview Questions for Spark, Kafka, and Hive:
1. How to handle Kafka back pressure with scripting parameters?
2. How to achieve performance tuning through executors?
3. What is the idle size of deciding the executors and what ram should be used?
4. How do you scale Kafka brokers and Integrate with spark streaming without stopping the cluster and along with script?
5.How to delete records in Hive and how to delete duplicate records with the scripting?
6. Can we have more than one replica exist in the same rack?
7. In a database out of 10 tables, one table is failed while importing from MySql into HDFS by using Sqoop? What is the solution?
8. If you submit a spark job in a cluster and almost rdd has already created in the middle of the process the cluster goes down what will happen to you are rdd and how data will tackle?
Summary: Nowadays asked these type of scenario-based interview questions in Big Data environment for Spark and Hive.