What is Resource Management in Hadoop?
Resource Management is about controlling how much of a finite resource should be allocated to the given user, group, or job.
In the context for Hadoop, the resources we are primarily concerned with are disk space consumption and the number of files in HDFS, and map and reduce usage, in the case of MapReduce.
HDFS, like many filesystems, supports the notation of quotas on disk space consumption.
Administrators can specify a limit on the physical size a file or directory in HDFS may be.
Setting space Quota:
Hadoop dfsadmin -setSpaceQuota size path Removing Quota from a directory hadoop dfsadmin -clrSpaceQuota path
Set a 10GB quota on the path /user/hdfsuser hadoop dfsadmin -setSpaceQuota 1073718/user/hdfsuser Viewing the quota on the path /user/hdfsuser hdfs fs -count -q /user/hdfsuser
- FIFO – First come, first served, an algorithm for scheduling tasks
- For example, given two jobs – A and B submitted in that order, all map tasks in job A will execute before any tasks from job B, As job A map tasks complete, job B map tasks are scheduled.
- Job prioritization (lowest to highest):
- very low, low normal, high, very high
- Each priority is actually implemented as a separate FIFO queue
- All tasks from higher priority queues are processed before lower priority queues
- The easiest way to visualize prioritized FIFO scheduling is to think of it as five FIFO queues ordered top to bottom by priority. Tasks are then scheduled left to right, top to bottom
Configuration of FIFO:
- Configured in mapred-site.xml
- The parameter to be set is mapred.jobtracker.taskSchduler
- The name of the implementation class of the FIFO scheduler is org.apache.hadoop.mapred.JobQueueTaskScheduler
Sometimes called the “Faire Share Scheduler of FS scheduler”.
- Alternative to default FIFO scheduler
- Developed to solve some of the problems that arise when using the FIFO scheduler in high traffic multitenant environment
- It uses a concept of slots
- Each time a task tracker heartbeats to the job tracker and reports available slots, the rules are evaluated and queued tasks are assigned for execution.