NetworkWordCount Program in Spark with Scala[Explanation]





In this blog post explained about NetworkWordCount program in Spark with the Scala program. First, we need to basic concepts of WordCount in the Spark program after that this program is very simple to understand. Otherwise, it will a little bit difficult. Point to point explain about this program in the comment line.

NetworkWordCount Program in Spark

package com.sreekanth.spark.streaming.usecase

//create package for this NetworkWordCount usecase
import org.apache.spark.SparkConf

//import Spark Configuration packages

import org.apache.spark.streaming.StreamingContext

//import Spark Streaming context packages

import org.apache.spark.streaming.Seconds

import org.apache.spark,streaming.kafka.KafkaUtils

// import Spark Streaming Kafka utils

import org.apache.spark.storage.StorageLevel

object NewNetWordCount {

//create object class NewNetWordCount

def main(args: Array[String]) {

if (args.length < 2){

System.err.println("Usecase : NewNetworkWordCount")

//Println with Scala code level

System.exit(1)

}

val sparkConf = new SparkConf().setMaster("local[*]".setAppName("NewNetWordCount")

//To create a Data Stream that will connect your hostname and port in case of single node cluster simply will give localhost:8088

val ssc = new StreamingContext(sparkConf, seconds(2))

// Here we create the context with a 2 seconds batch size

val presentLines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)

//Create a socket stream and replication mandatory with fault tolerance

val words = presentLines.flatMap(_.split(" "))

val newPairs = words.map(word => (word, 1))

val wordCounts = newPairs.reduceByKey(_+_)

//Print the first ten elements of each define RDD generate in the DStream to the console

wordCounts.print() //Print NetwordCount of each define RDD generates.

ssc.start() //start the computation

ssc.awaitTermination() // Wait for the computation to terminate in the end of the program

}

}





Summary: The above Spark with Scala code simplifies to the NetWorkWordCount program for Spark developer at the beginner level. Nowadays interview panel members are also asking this type of question instead of WordCount Spark Program. Here we explain how NetWorkWordCount program execution step by step processing with packages. At beginner level Spark program is a bit difficult. Before that you have a basic idea on WordCount example in your mind then it is very simple. In PySpark also almost the same programming, a bit of difference in different style programs in Python.