CommandsTech

1. In Spark, a —————– is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

A) Resilient Distributed Dataset (RDD) C)Driver

B)Spark Streaming D) Flat Map

Ans: Resilient Distributed Dataset (RDD)

2. Consider the following statement is the correct context of Apache Spark :

Statement 1: Spark allows you to choose whether you want to persist Resilient Distributed Dataset (RDD) onto the disk or not.

Statement 2: Spark also gives you control over how you can partition your Resilient Distributed Datasets (RDDs).

A)Only statement 1 is true C)Both statements are true

B)Only statement 2 is true D)Both statements are false

Ans: Both statements are true

3) Given the following definition about the join transformation in Apache Spark:

def : join [W] (other: RDD[(K, W)]) : RDD [(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize (Seq ((“m”,55), (“m”,56), (“e”,57), (“e”,58), (“s”,59),(“s”,54)))
val rdd2 = sc.parallelize (Seq ((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
A) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
B) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
D)None of the mentioned.

Ans: Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

4)Consider the following statements are correct:

Statement 1: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf)

Statement 2: Scale out means grow your cluster capacity by replacing with more powerful machines

A) Only statement 1 is true C) Both statements are true

B) Only statement 2 is true D) Both statements are false

Ans: Both statements are true

CommandsTech

Posts

BigData and Spark Multiple Choice Questions – I