MapReduce in Hadoop

Map Reduce :

MR is a core processing component of Hadoop which is meant for processing of huge data in a parallel on commodity hardware machines. It is a algorithm contains two important tasks , that is Map and Reduce,

Map: Takes a set of data and converts it into another set of data, where individual elements are broken into tuples are like key and values pairs.

Reduce: reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

Map Reduce Life Cycle:



A Map Reduce job usually splits the input data-set into independent chunks, which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Both the input and output of the job are stored in a file-system. The frame work takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

The Map/Reduce framework operates on <Key, Value> pairs that is the framework views the input to the job as a set of <key,value> pairs and produces a set of <key, value> pairs as the output of the job.

 

MapReduce Programming Model:

  • Split the data into independent chunks based on key,value pair. This is done by Map task in a parallel manner.
  • Output of the Map jobs is sorted based on the key values
  • The sorted output is the input to the Reduce job. And then it produces the final output to the processing and returns to the client.

 

Leave a Reply

Your email address will not be published. Required fields are marked *