Spark & Scala Interview Questions and Answers

1. What is Scala what are the importance of it and the difference between Scala and other Programming languages (Java/Python)?

Scala is the most powerful language for developing big data environment applications. Scala provides several benefits to achieve significant productivity. It helps to write robust code with fewer bugs.  Apache Spark is written in Scala, so Scala is a natural fit for the developing Spark applications.

2. What is RDD tell me in brief?

Spark RDD is a primary abstract class in Spark API. RDD is a collection of partitioned data elements that can be operated in parallel. Normally, RDD is supporting properties like Immutable, Cacheable, Type Infer, and Lazy evaluation.

Immutable: RDD’s are Immutable data structures. Once created, it cannot be modified

Partitioned: The Data in RDD’s are partitioned across the distributed cluster of nodes. However, multiple Cassandra partition can be mapped to one single RDD partition

Fault Tolerance: RDD is designed to be a fault – tolerant. Because the RDD data is stored across the large distributed cluster. So there is a chance for node failure in that cluster by this we can lose the Partitioned data in that node.

RDD automatically handles the node failure. Spark will maintain the metadata of each RDD and details about the RDD. So by using that information, we can get that data from other nodes.

Interface: RDD provides a uniform interface for processing data from a variety of data sources such as HDFS, HBase, Cassandra, MongoDB, and others. The same interface can also be used to process data stored in memory across a cluster of nodes.

InMemory: The RDD class provides the API for enabling in-memory cluster computing. Spark allows RDDs to be cached or persisted in memory

3. How to register a temporary table in Spark SQL?

When we creating the data frame by loading the data into it using SQL Context object. This is treated a temporary table. Because the scope of the data frame is to a particular session

4. How to count the number of lines in Scala?

In Scala programming language using getLines.size property we can count

Example: Val countlines = source.getLines.size


Java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver in Spark Scala

While writing Apache Spark in Scala / Python (PySpark) programming language to read data from Oracle Data Base using Scala / Python in Linux operating system/ Amazon Web Services, sometimes will get below error in

spark.driver.extraClassPath in either executor class or driver class.

Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver


at Java .lang.ClassLoader.loadClass(

at Java.lang.ClassLoader.loadClass(

at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:35)

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anofun$createConnectionFactory$1.api

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anofun$createConnectionFactory$1.api

at scala.Option.foreach ( Option .scla:236)

at org . apache . spark . sql . execution . datasources . jdbc.JdbcUtils $ anofun $ createConnection Factory $ (JdbcUtils.scala)

at <init> ( < console >:46)

at . <init> (<console>:52)

at. <clinit> (<console>)

at. <init> (<console>:7)

at. <clinit> (<console>)

at $print (<console>)


After getting this error will provide a simple solution in Spark Scala. Sometimes these are coming to Python (PySpark).

import related jars to both executor class and driver class. First, we need to edit the configuration file as spark defaults in spark-default.conf file.

Adding below two jar files path in spark-default.conf file.

spark.driver.extraClassPath /home/hadoop/ojdbc7.jar
spark.executor.extraClassPath /home/hadoop/ojdbc7.jar

Above two jar files path in configurations with exact version is matched to your Spark version otherwise will get the compatible issue.

Sometimes these two classpaths get an error then will add in your code either Scala or Pyspark programming –conf before Spark driver, executor jar files in bottom of the page example.

If will get the same issue again then will follow the below solution:
Step 1: Download Spark ODBC jar files from the official Maven website.

Step 2: Copy the download jar files into the below path in the share location in Spark.


For Example –  PySpark programming code snippet for more information.


pyspark --driver-classpath /home/hadoop/odbc7.jar --jars #  jar file path
from pyspark import SparkContext, Spark conf # import Spark Context and configuration

from pyspark.sql import SparkContext #Sql context

sqlContext = sqlContext (sc)

dbconnection = sqlContext . read . format ("jdbc") . options (url: "Give your jdbc connection url path").load()
#Data Base connection to read data from Spark with the help of jdbc


Pattern Matching in Scala

Pattern Matching in Scala with Examples

A pattern match includes a sequence of alternatives each starting with the keyword use case. Each alternative includes a pattern and one or more expressions.

Example :

object PatternDemo {

def main(args : Array[String])


var months = List(“Jan”,”Feb”,”Mar”,”Apr”,”May”)

for (month <- months)


prinltn (month)

month match{

case “Jan” => prinltn (“First Month of the Year”)

case “Feb” => prinltn (“Second Month of the Year”)

case “Mar” => prinltn (“Third Month of the Year”)

case “Apr” => prinltn (“Fourth Month of the Year”)

case “May” => prinltn (“Fifth Month of the Year”)




Jan First Month of the Year

Feb Second Month of the Year

Mar Second Month of the Year

Apr Second Month of the Year

May Second Month of the Year

Example 2:

object techPatternMatch


def main( args: Array[String] )


var technologies = List (“Java”, “Python”, “Hadoop”, “Scala” )

for (tech <- technologies ){

println (tech)

tech match{

case “Hadoop”  | “Spark”=> println (“Big data technlogies”)

case “Java”  | “C++”=> println (“OOPS”)

case “Python”  | “Go”=> println (“Advanced Tech”)

case “Scala” => println (“Functional Programming”)





Hadoop Big data technologies

Spark Big data technologies



Python Advanced Tech

Go Advanced Tech

Scala Functional Programming


object letterMatch_Case


def main(args: Array[String]){

println (x)

x match() {

case x if x%2 == 0 => prinltn (“Number is Even”)

case x if x%2 == 1 => prinltn (“Number is Odd”)




1  Number is Odd

2 Number is Even



List :

Scala Lists are similar to Arrays which means that all elements of a list have the same type but there are two important differences. First Lists are immutable (Cannot be changed) and Second Lists are represented, Linked List.

scala > val names : List [String] = List (“Sreekanth”, “Vijay”, “Vinay”);

names : List[String] = List (Sreekanth, Vijay, Vinay)

scala > val names : List[String] = List (“Alien”, “Bob”, “Carey”);

names: List [String] = List (Alien, Bob, Carey)

scala > println(names (0));

O/P : Alien

scala > println (names (1));

O/P : Bob

scala > val marks = List[Int] = List [10,20,30,40)

marks: List[Int] = List(10,20,30,40)

scala > prinltn(marks. head)

O/P : 10

scala > prinltn (marks. tail)

O/P : 20,30,40

To Concatenate the Two Lists using :::

val names  = “Sreekanth” :: ( “Vijay”:: “Vinay”);

names: List [String] = List (Sreekanth, Vijay, Vinay)

val address = “Hyd” :: (“Banglore” :: (“Chennai” ))

address: List[String] = List (Hyd, Banglore, Chennai)

var name_address= names ::: address

name_address: List [String] = List (Sreekanth, Vijay, Vinay, Hyd, Banglore, Chennai)

Set :

Set is a collection that contains no duplicate elements. There are two kinds of Sets, the immutable and mutable. The difference between mutable and immutable objects that when an object is immutable the object itself can’t be changed.

By default, Scala uses the immutable set.

Scala > val ranks = List (1,2,3,4,2,2,3,2)

ranks: List[Int] = List (1,2,3,4,2,2,3,2)

Scala > val ranks = Set(1,2,3,4,2,2,3,2)

ranks : scala .collection.immutable.Set[Int] = Set (1,2,3,4)

Set Operations in Scala:

Example : object setOperations


def main (args: Array[String])


val marks = Set(10,20,30,40);

val updateMarks = set(15,25,35,45);

println (“Max marks : ” + marks .max);

prinltn( “Min marks :” + marks.min);

prinltn (“marks.intersect(updateMarks): ” + marks.intersect(updateMarks));





  1. Scala map is a collection of key and value pairs in a collection set.
  2. Any value can be retrieved based on its key.
  3. Keys are unique in the Map, but values need not be unique.
  4. There are two kinds of Maps, the immutable and the mutable.

scala > mapPrgogram.scala

object mapPrg


def main(args: Array[String])


val Technology= Map (“Java” -> “OOPS”, “ML” -> AI , “Hadoop” -> “Big Data”);

println(“Keys in Technologies:” + Technology.keys);

prinltn(“Values of Technologies:” + Technology.values)



scala > scalac mapProgram.scala

scala > scala mapProgram


Keys in Technologies: Set(Java, ML, Hadoop)

Values in Technologies : Map ( OOPS, AI, Big Data )

SCALA Iterators & Arrays With Examples


Scala iterator is not a collection, but rather a way to access the elements of a collection one by one.

Here two basic operations on iterators

2. hasNext

A Call to will return the element of the iterator and advance the state of the iterator.

A Call to it.hasNext() will return more elements of the iterator


scala > val itObject = Iterator (“Hadoop”, “Scala”, “Spark”);

itObject: Iterator[String] = non – empty iterator

scala > while (itObject.hasNext) {

println(” ” +}


Scala  Spark

scala > println (“Size of Iterator : ” +itObject. size)

Size of Iterator : 3


Scala array is a normal array. To create an array of integers below like this:

scala > var arrayEle = Array (1,2,3,4,5,6,7,8) ;

arrayEle: Array[Int] = Array (1,2,3,4,5,6,7,8)

scala > for (x < – arrayEle) {

println(“Array elements”+ x)}

scala > println (arrayEle. length)

O/p: 8

SCALA LOOPS With Examples


For Loop:

A for loop is repetition control structure that allows   to efficiently write a loop that needs to execute a specific number of times.

Different types of for loop in SCALA:

  1. For Loop with Collections
  2. For Loop with Range
  3. For Loop with Filters
  4. For Loop with Yield.

For Loop with Collections: 

for (var <- List)




Here the List variable is a collection type having a list of elements and for loop iterate through all the elements returning one element in x variable at a time.


object ForLoopCollections{

def main (args: Array[String])


var x=0;

val numList = List(1,2,3,4,5);

for (x <-numList)


println(“Value of X:” +x)




For Loop with Range:

for (var x <- Range)




Here Range is the number of numbers that represented as I to J and   ” <-“means that “Generator” command as this operator only generating individual values from a range of numbers.


object ForLoopRange{

def main(args: Array[String])


var x = 0;

for (x <- 1 to 100)


println(“Value of X:” + x);




For Loop with Filters:

for ( var x <- List if conditions 1; if conditions 2; if conditions 3…)






object ForLoopFilterDemo{

def main(args: Array[String])


var x =0;

val numList = List (1,2,3,4,5,6,7,8,9);

for ( x <- numList if x!=5; if x < 9)


println(“Value of X:”+x);




For Loop with Yield:

val NorVal = for { var x <- List if condition 1; if condition 2….}

yield x

Here to store return values from a for loop in a variable or can return through a function.


object ForLoopYieldDemo{

def main(args: Array[String])


var x=0;

val numList = List(1,2,3,4,5,6,7,8,9);

val NorVal = for { x <- numList if x!=3; if x <9) yield a

for (x <- NorVal)


println(“Value of X”+x);




While Loop

Normally while loop statement repeatedly executes a target statement as long as a given condition is true.






object WhileLoopDemo{

def main(args: Array[String])


var x = 100;

while (x <300){

println (“Value of X:” +x);

x= x+1;




Do While Loop

Unlike while loop, which tests the loop condition at the top of the loop, the do while loop checks its condition at the bottom of the loop.

While and Do While loop is similar, except that a do while loop is guaranteed to execute at least one time.




} while(condition);


object DoWhileLoopDemo{

def main(args: Array[String]{

var x =100;


println (“Value of X :”+x);


}while (x >300)




Functions in SCALA:

1. Higher Order Functions in Scala:

I) foreach ()

II) map ()

III) reduce ()

I) foreach() :

scala > val technologies = List ( “Hadoop”, “Java”, “Salesforce”);

technologies: List[String] = List (Hadoop, Java, Salesforce)

scala >  technologies . foreach ((t:String) = > ( println (t) ) );





II) map() :

scala > val technologies = List ( “Hadoop”, “Java”, “Salesforce”);

technologies: List[String] = List (Hadoop, Java, Salesforce)

scala >  val tSize = technologies . map ((c) = > ( c.size));

tSize : List [Int] = List (6, 4, 10)

scala > println (tSize);

List (6,4,10)

III) reduce() :

scala > val score = List (183,186,190,191)

score: List [Int] = List (183,186,190,191)

scala > val totalScore = score.reduce( (a : Int, b: Int) = > (a+b));

totalMarks: Int = 750

scala> val totalScore =score. reduce((a,b) => (a+b));

totalMarks: Int = 750

Anonymous Functions in SCALA:

Anonymous Functions mainly the combination of Function literals and Function Values.

Function literals : Anonymous functions in source code is called function literals 

Function Values : Function literals are instantiated into objects are called function values.

Example of Anonymous Function:


object Anonymous_Funct


def main (args: Array[String])


var add = (x: Int) => x+1;

var mul = (a:Int, b:Int) => a*b;

println (“Addition value :” + add(10));

println (“Multiple value : “+ mul (9, 10) );

println ( ” New Addition Value: ” + ( (10) – 5));

println ( ” New Multiplied Value: ” + ( mul  (10,9)+50 ) ) );




Scala> scalac Anonymous_Funct.scala

Scala> Anonymous_Funct

Addition value 11

Multiple value 19

New Addition Value 5

New Multiplied Value 140