Spark Streaming Use Case with Explanation:
Using Scala streaming imports
import org. apache. spark. streaming. StreamingContext
import org. apache. spark. streaming. StreamingContext._
import org. apache. spark. streaming.dstream . DStream
import org. apache. spark. streaming.Duration
import org. apache. spark. streaming.Seconds
Spark Streaming Context :
This is also sets up underlying SparkContext that it will use to process data. It takes as input a batch interval specifying how often to process new data
socketTextStream()
We use socketTextStream() to create a DStream based on text data received on the local machine
Then we transform the DStream with filter() to get only the lines that contains error. Output operation print() to print some of the filtered lines.
Create a Streaming Context with a 1 – second batch size frin a SparkConf
val scc=new StreamingContext(conf, Seconds(1))
// Create DStream using data received after connecting to default port on the local machine
val lines = scc.socketTextStream(“localhost”, 9000)
//Filter our DStream for lines with “error”
var errorLines = lines.filter(_.contains(“error”))
//Print out the lines with errors
errorLines. print()
Above an example of converting a stream of lines to words, the flatMap operation is applied on each RDD in the lines DStream to generate the RDDs of the words DStream. This is shown in below figure Input DStreams are DStreams representing the stream of input data received from streaming process. In the above example of converting streaming of lines information words, lines was an input DStream as it represented the stream if data received from the server.
Every input DStream is associated with a Receiver object whether Java, Scala etc. Which receives the data from a source and stores it in Spark’s memory for processing. Here Spark Streaming provides two categories :
- Basic Sources: Sources directly available in the Streaming Context API example: file systems, socket connections
- Advanced Source: Sources indirectly available like Flume, Kafka, Twitter etc. are available through extra utility classes.