Word Count use case in Spark
First How to initialize Spark Context
import org. apache. spark. SparkConf
import org. apache. spark. SparkContext
import org. apache. spark. SparkContext._
val conf = new SparkConf (). setMaster (“local”). setAppName (“APP”)
val sc = new SparkContext (conf)
Note: An application name, namely APP in these examples. This will identify your application on the cluster manager’s UI and A cluster URL namely local in these examples which tells Spark how to connect to a cluster.
#)Word Count Use Case Using Spark Context in SCALA
//Create a Scala Spark Context.
val conf = new SparkConf (). setAppName (“Word Count”)
val sc = new SparkContext (conf)
//Load our input data
val input = sc. textFile (inputFile)
//Split into words
val words =input. flatMap ( line => line. split (” “))
//Transformation into pairs and count.
val counts=words.map(word =>(word,1)).reduce By Key { case (x, y) => x + y }
//Save the word count back out a text file
counts. saveAsTextFile (outputFile).
OTHER EXAMPLES IN SCALA:
//create an RDD based on “data”
val data = 1 to 1000
val distData = sc. parallelize(data)
//select the values less than 10
distData. filter(_<10).collect()
//base RDD
val lines=sc. textFile(“localhost:54280/ EmployeeLogs. txt”)
//transformed RDDs
val emp = lines.filter (_.startsWith (“Emp”))!
val messages = emp.map (_.split(“\t”)). map(r=>r(1))!
messages.cache()!
messages.filter (_.contains(“mysql”)).count()!
messages.filter (_.contains(“Hadoop”)).count