Brief History of MapReduce in Hadoop | BigData | MapReduce | Hadoop




A brief history of Hadoop 2:

Originally created and architected by the team, one of the search engine Yahoo

Arun Murthy created the original JIRA in 2008 and now is the Hadoop 2 resource manager

The community has worked on Hadoop 2 for over many years.

Hadoop 2 based architecture running at scale at Yahoo

Deployed on 35K nodes for 1 year

Next Generation Platform:

HDFS 2: In a nutshell

  • Remove tight coupling of block storage and Namespace
  • High Availability (HA)
  • Scalability and isolation
  • Increased performance

HDFS 2: Federation:

  • Below Namenodes do not talk to each other. Namenodes manages the only a slice of namespace
  • Block storage as generic storage service
  • Data nodes can store blocks manged by any Name Node

HDFS 2 : Architecture:





Here will deeply explain about HDFS 2 Architecture within the cluster. Basically HDFS architecture follows Master/Slave architecture whereas Namenode acts as a  master and Datanode acts as a slave.  Below diagram says that:

1. Only the Active NameNode writes edits

2.Shared state on a shared directory on “NFS” (Network File System)

3. Data Nodes report to both NameNode but listen only to the orders from the active one.

Hadoop 1.x single node failure, it becomes a bottleneck on the difficult for market. Clients need to Name node which is high availability so it is one of the reason became the foundation of HDFS architecture and High Availability in Hadoop 2.x version

Normally HDFS has two layers:

1. HDFS NameSpace (NS): In this layer is responsible for managing the directories files and blocks.

2.Storage Layer: It consists of two basic management systems:

A.  Block management: Block management system checks the heartbeats of Datanodes and Name node communications in the cluster. It supports block operations like creation, modification, deletion and etc.




B. Physical storage management: Physical management is managed by DataNodes which is responsible for storing large data and it provides Read/Write access to the user in the Hadoop eco-system.