In this post, we will explain what is Spark transformation mapr and flatMap with examples and simple difference between map and flatMap in Spark
What is the map in Spark?
The map method is a higher-order method that takes a function as input and applies it to each element in the source RDD to create a new RDD in Spark. Here we take logfile from a local file system
Scala > var file =sc.textFile("/home/sreekanth/Desktop/input.log") Scala > var fileLength= file.map(l=>l.length) Scala > fileLength.collect Output : res1: Array[Int] = Array(10,34,15,14)
What is flatMap in Spark?
The flatMap method is a higher-order method and transformation operation that takes an input function, which returns sequence for each input element passed to it.
The flatMap method returns a new RDD formed by flattening this collection of sequences.
Scala> var file = sc.textFile("hdfs://localhost:54310/SparkInputDirectory/gdt") scala> var fileWords = file.flatMap( a = > a.split))) scala > fileWords.collect Output: res1: Array[String] = Array(Hello, Bigdata,Spark, MongoDB)
Difference between map and flatMap:
Here we provided simple difference between Spark transformation map and flatMap with examples for Spark professionals.
Basically, map and flatMap are similar but little bit difference in the input RDD and apply function on it.
1. map transformation returns only one element in the function level or it returns all elements in single array.
rdd.map —> it returns all elements in a single array. Below we provided example in Scala.
2. flatMap transformation returns a list of elements or none of the elements
(0 or more) as an iterator.
rdd.flatMap —> it returns elements in Arrays of array. Below one is simple example in Spark with Scala program.
Summary: Basically, here we provided what is map and flatMap with examples. We explained the major difference between map and flatMap with simple examples for Spark developers. This one basic interview question for developers and admin in the Big Data environment.