Capgemini Spark Developer Real-time interview questions for experienced | Big Data | Spark | Hadoop

In this article, we will explain Capgemini Spark developer real-time interview questions for experienced  in the Big Data environment for Big Data professionals.



Capgemini Spark Developer Real-time interview questions:

Project flow common question:

1. Explain you’re current project and you’re roles and responsibilities in the project?

2.What are the considerations has to take while data migration.

3. Explain data migration and processing pipeline in you’re project?

4.Which ETL tools, technologies will use and explain why?

5.What is Spark and explain spark architecture? and convert into you’re project?

6.Write a Spark code to create a new column by applying operations on existing columns in Data Frame

7.How to write Data Frame? explain different modes to write it?




8.How to rename the columns on Data Frame?

9.Please explain difference between coalesce and repartition? have you used in project and how?

10.What is Skewness? how to resolve it?

11.Explain Spark optimization techniques?

12.What is broadcast variable? explain with example?

13.What is different types of joins in Apache Spark? explain classical joins?

14.What is broadcast join? explain sort merge join in Spark?

15.How to remove duplicates from Data Frame?

16. If we create a new column and give same name for it which is already exists in Data Frame, then what will happen?




17.Explain User Defined Functions (UDF) in Spark? have you used in project? if yes then explain?

18.What is the advantage of Lazy Evaluation in Spark?

19. What are the memory optimization techniques in Spark?

20.Scenari based question: There are 2 Data Frames emp, department and write a code to join them simply?

21. What is Spark session? how it is initialize?

22. What are the issues you have faced in you’re project and how you resolved those?

The above Spark interview questions are not only for Capgemini. Every company is asked the above questions with logic manner. Spark is one of the best technology at current situation in the Big Data environment. Basically, first they asked project related and then start the Spark real-time interview questions for Big Data professionals.