Resource Management in Hadoop and Big Data | HDFS | FIFO| Fair Scheduler




What is Resource Management in Hadoop?

Resource Management is about controlling how much of a finite resource should be allocated to the given user, group, or job.

In the context for Hadoop, the resources we are primarily concerned with are disk space consumption and the number of files in HDFS, and map and reduce usage, in the case of MapReduce.

HDFS Quotas:

HDFS, like many filesystems, supports the notation of quotas on disk space consumption.

Administrators can specify a limit on the physical size a file or directory in HDFS may be.

Setting space Quota:

Hadoop dfsadmin -setSpaceQuota size path
Removing Quota from a directory
hadoop dfsadmin -clrSpaceQuota path

Example:

Set a 10GB quota on the path /user/hdfsuser
hadoop dfsadmin -setSpaceQuota 1073718/user/hdfsuser
Viewing the quota on the path /user/hdfsuser
hdfs fs -count -q /user/hdfsuser

FIFO Scheduler:

  • FIFO – First come, first served, an algorithm for scheduling tasks
  • For example, given two jobs – A and B submitted in that order, all map tasks in job A will execute before any tasks from job B, As job A map tasks complete, job B map tasks are scheduled.

FIFO Queue:




  • Job prioritization (lowest to highest):
  • very low, low normal, high, very high
  • Each priority is actually implemented as a separate FIFO queue
  • All tasks from higher priority queues are processed before lower priority queues
  • The easiest way to visualize prioritized FIFO scheduling is to think of it as five FIFO queues ordered top to bottom by priority. Tasks are then scheduled left to right, top to bottom

Configuration of FIFO:

  • Configured in mapred-site.xml
  • The parameter to be set is mapred.jobtracker.taskSchduler
  • The name of the implementation class of the FIFO scheduler is org.apache.hadoop.mapred.JobQueueTaskScheduler

Fair Scheduler:

Sometimes called the “Faire Share Scheduler of FS scheduler”.

  • Alternative to default FIFO scheduler
  • Developed to solve some of the problems that arise when using the FIFO scheduler in high traffic multitenant environment
  • It uses a concept of slots
  • Each time a task tracker heartbeats to the job tracker and reports available slots, the rules are evaluated and queued tasks are assigned for execution.