Spark Core interview questions :
1)What is Spark and explain briefly?
Spark is an in-memory cluster computing framework for processing and analyzing a large amount of data. Spark provides a simple programming interface, which enables an application developer to easily to use Memory, CPU and storage resources across a cluster of servers for processing in large data sets.
2)What is RDD and explain RDD properties?
Resilient Distributed Data sets represent a collection of partitioned data elements that can be operated on in a parallel manner. RDD is the primary data abstraction mechanism in Spark and defined as an abstract class in Spark library it is similar to SCALA collection and it supports LAZY evaluation.
3)What is Lazy evaluation, Why Spark is Lazy Evaluated?
Spark is “Lazy Evaluated ” system because Spark computes RDDs. Although you can define new RDDs any time, Spark computes them only in a lazy way that is the first time they are used in an action. This approach might seem unusual at first, but makes a lot of sense when you are working with Big Data.
4)What is the Spark Context?
Spark Context is a class defined in the Spark library and main entry point into the Spark library. Spark Context will run in a program called “Driver Program” is the main program in Spark.
5)What are narrow and wide dependencies in RDD?
Narrow Dependencies:
In an RDD each parent partition contributes data to a single child partition and it is a sequence of operations involving narrow dependencies can be pipelined.
Wide Dependencies:
In and RDD each parent partition contributes data to multiple child partitions and it requires a shuffle and expensive operation in a distributed system
6)What are the components of the Spark Compute Engine?
Spark Compute Engine is a data parallel application for data processing. It is divided into three components.
1.Driver
2.Cluster manager
3.Executor