1. In Spark, a —————– is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
A) Resilient Distributed Dataset (RDD) C)Driver
B)Spark Streaming D) Flat Map
Ans: Resilient Distributed Dataset (RDD)
2. Consider the following statement is the correct context of Apache Spark :
Statement 1: Spark allows you to choose whether you want to persist Resilient Distributed Dataset (RDD) onto the disk or not.
Statement 2: Spark also gives you control over how you can partition your Resilient Distributed Datasets (RDDs).
A)Only statement 1 is true C)Both statements are true
B)Only statement 2 is true D)Both statements are false
Ans: Both statements are true
3) Given the following definition about the join transformation in Apache Spark:
def : join [W] (other: RDD[(K, W)]) : RDD [(K, (V, W))]
Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
Output the result of joinrdd, when the following code is run.
val rdd1 = sc.parallelize (Seq ((“m”,55), (“m”,56), (“e”,57), (“e”,58), (“s”,59),(“s”,54)))
val rdd2 = sc.parallelize (Seq ((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
A) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
B) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
D)None of the mentioned.
Ans: Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
4)Consider the following statements are correct:
Statement 1: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf)
Statement 2: Scale out means grow your cluster capacity by replacing with more powerful machines
A) Only statement 1 is true C) Both statements are true
B) Only statement 2 is true D) Both statements are false
Ans: Both statements are true