MapReduce in Hadoop - CommandsTech

Map Reduce :

MR is a core processing component of Hadoop which is meant for processing of huge data in a parallel on commodity hardware machines. It is an algorithm contains two important tasks, that is Map and Reduce,

Map: Takes a set of data and converts it into another set of data, where individual elements are broken into tuples are like key and values pairs.

Reduce: reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

Map Reduce Life Cycle:

A Map-Reduce job usually splits the input data-set into independent chunks, which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then inputted to the reduce tasks. Both the input and output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

The Map/Reduce framework operates on < Key, Value> pairs that are the framework views the input to the job as a set of < key,value> pairs and produces a set of < key, value> pairs as the output of the job.

MapReduce Programming Model:

Split the data into independent chunks based on key,value pair. This is done by Map task in a parallel manner.
Output of the Map jobs is sorted based on the key values
The sorted output is the input to the Reduce job. And then it produces the final output to the processing and returns to the client.