In this blog post explained about NetworkWordCount program in Spark with the Scala program. First, we need to basic concepts of WordCount in the Spark program after that this program is very simple to understand. Otherwise, it will a little bit difficult. Point to point explain about this program in the comment line.
NetworkWordCount Program in Spark
package com.sreekanth.spark.streaming.usecase //create package for this NetworkWordCount usecase import org.apache.spark.SparkConf //import Spark Configuration packages import org.apache.spark.streaming.StreamingContext //import Spark Streaming context packages import org.apache.spark.streaming.Seconds import org.apache.spark,streaming.kafka.KafkaUtils // import Spark Streaming Kafka utils import org.apache.spark.storage.StorageLevel object NewNetWordCount { //create object class NewNetWordCount def main(args: Array[String]) { if (args.length < 2){ System.err.println("Usecase : NewNetworkWordCount") //Println with Scala code level System.exit(1) } val sparkConf = new SparkConf().setMaster("local[*]".setAppName("NewNetWordCount") //To create a Data Stream that will connect your hostname and port in case of single node cluster simply will give localhost:8088 val ssc = new StreamingContext(sparkConf, seconds(2)) // Here we create the context with a 2 seconds batch size val presentLines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) //Create a socket stream and replication mandatory with fault tolerance val words = presentLines.flatMap(_.split(" ")) val newPairs = words.map(word => (word, 1)) val wordCounts = newPairs.reduceByKey(_+_) //Print the first ten elements of each define RDD generate in the DStream to the console wordCounts.print() //Print NetwordCount of each define RDD generates. ssc.start() //start the computation ssc.awaitTermination() // Wait for the computation to terminate in the end of the program } }
Summary: The above Spark with Scala code simplifies to the NetWorkWordCount program for Spark developer at the beginner level. Nowadays interview panel members are also asking this type of question instead of WordCount Spark Program. Here we explain how NetWorkWordCount program execution step by step processing with packages. At beginner level Spark program is a bit difficult. Before that you have a basic idea on WordCount example in your mind then it is very simple. In PySpark also almost the same programming, a bit of difference in different style programs in Python.