Word Count Use Case in Spark




Word Count use case in Spark

First How to initialize Spark Context

import org. apache. spark. SparkConf

import org. apache. spark. SparkContext

import org. apache. spark. SparkContext._

val conf = new SparkConf (). setMaster (“local”). setAppName (“APP”)

val sc = new SparkContext (conf)

 

Note: An application name, namely APP in these examples. This will identify your application on the cluster manager’s UI and A cluster URL namely local in these examples which tells Spark how to connect to a cluster.

 

#)Word Count Use Case Using Spark Context in SCALA

//Create a Scala Spark Context.

val conf = new SparkConf (). setAppName (“Word Count”)

val sc = new SparkContext (conf)

//Load our input data

val input = sc. textFile (inputFile)

//Split into words

val words =input. flatMap ( line => line. split (” “))

//Transformation into pairs and count.

val counts=words.map(word =>(word,1)).reduce By Key { case (x, y) => x + y }

//Save the word count back out a text file

counts. saveAsTextFile (outputFile).

OTHER EXAMPLES IN SCALA:

//create an RDD based on “data”

val data = 1 to 1000

val distData = sc. parallelize(data)

//select the values less than 10

distData. filter(_<10).collect()

//base RDD

val lines=sc. textFile(“localhost:54280/ EmployeeLogs. txt”)

//transformed RDDs

val emp = lines.filter (_.startsWith (“Emp”))!

val messages = emp.map (_.split(“\t”)). map(r=>r(1))!




messages.cache()!

messages.filter (_.contains(“mysql”)).count()!

messages.filter (_.contains(“Hadoop”)).count