Word Count use case in Spark

First How to initialize Spark Context

import org. apache. spark. SparkConf

import org. apache. spark. SparkContext

import org. apache. spark. SparkContext._

val conf = new SparkConf (). setMaster (“local”). setAppName (“APP”)

val sc = new SparkContext (conf)

Note: An application name, namely APP in these examples. This will identify your application on the cluster manager’s UI and A cluster URL namely local in these examples which tells Spark how to connect to a cluster.

#)Word Count Use Case Using Spark Context in SCALA

//Create a Scala Spark Context.

val conf = new SparkConf (). setAppName (“Word Count”)

val sc = new SparkContext (conf)

//Load our input data

val input = sc. textFile (inputFile)

//Split into words

val words =input. flatMap ( line => line. split (” “))

//Transformation into pairs and count.

val counts=words.map(word =>(word,1)).reduce By Key { case (x, y) => x + y }

//Save the word count back out a text file

counts. saveAsTextFile (outputFile).

OTHER EXAMPLES IN SCALA:

//create an RDD based on “data”

val data = 1 to 1000

val distData = sc. parallelize(data)

//select the values less than 10

distData. filter(_<10).collect()

//base RDD

val lines=sc. textFile(“localhost:54280/ EmployeeLogs. txt”)

//transformed RDDs

val emp = lines.filter (_.startsWith (“Emp”))!

val messages = emp.map (_.split(“\t”)). map(r=>r(1))!

messages.cache()!

messages.filter (_.contains(“mysql”)).count()!

messages.filter (_.contains(“Hadoop”)).count

Tag: parallelize

Word Count Use Case in Spark

Word Count use case in Spark

First How to initialize Spark Context