Difference between map and flatMap in Spark | what is map and flatMap with examples




  • What is the map in Spark?
    The map method is a higher-order method that takes a function as input and applies it to each element in the source RDD to create a new RDD in Spark. Here we take logfile from a local file system
Scala > var file =sc.textFile("/home/sreekanth/Desktop/input.log")
Scala > var fileLength= file.map(l=>l.length)
Scala > fileLength.collect
Output :
res1: Array[Int]  = Array(10,34,15,14)
  • What is flatMap in Spark?

The flatMap method is a higher-order method and transformation operation that takes an input function, which returns sequence for each input element passed to it.

The flatMap method returns a new RDD formed by flattening this collection of sequences.

Example:

Scala> var file = sc.textFile("hdfs://localhost:54310/SparkInputDirectory/gdt")
scala> var fileWords = file.flatMap( a = > a.split)))
scala > fileWords.collect
Output: res1: Array[String] = Array(Hello, Bigdata,Spark, MongoDB)

Difference between map and flatMap:

map -> map returns only one element

Example:

sc.parallelize([10,20,30]).map(lambda a:range(1,a)).collect()

flatMap -> flatMap returns a list of elements or none of the elements
(0 or more)
Example:

sc.parallelize([10,20,30]).flatMap(lambda a:range(1,a)).collect()

Leave a Reply

Your email address will not be published. Required fields are marked *