What is Fair Scheduler in Hadoop?

Fair Scheduler is also known as the Fair Share Scheduler or the FS scheduler. It is an alternative to the default FIFO scheduler. It allocates a share of cluster capacity to each user over time. The Faire scheduler enforces fair sharing within each queue and running jobs share the queue’s resources.

It uses a concept of slots and each time a task tracker heartbeats to the job tracker and reports within available slots.

Terms and Invariants of FS:

Total Capacity:

In the context for scheduling, total capacity (or total cluster capacity) is the sum of all slots of each type for map slots and reduce slots

A cluster with 10 task tracker, each with 8 map slots and 5 reduce slots, is said to have a total map slot capacity of 80 and a reduce slot capacity of 50

If the slot configuration or a number of task trackers is changed the total cluster capacity changes.

Total available capacity:

The total available capacity is the number of open slots in a cluster. An open slot is a slot that currently has no task assigned to it.

The total available capacities are divided into map and reduce capacity

The available capacity can never exceed the total capacity.

What is Pool?

A pool is a container for a group of jobs and the recipient of resource allocation

Rather than configure what resource should be assigned to each job, we assign resources to a pool and then put jobs in pools

Demand:

A pool is said to have demanded if and only if there are queue tasks that should be assigned to it.

Fair Share:

The “Fair” number of slots a pool should receive

Minimum share:

An administrator configured a number of slots that a pool is guaranteed to receive

What is Capacity Scheduler in Hadoop:

The capacity scheduler can interpret information about system resources utilization such as RAM, reported by TaskTracker and make scheduling decisions based on general server utilization and not only by looking at the cluster slots available.

This scheduler uses an approach similar to Fair Scheduler, but with some differences.
Configure capacity scheduler property mapred.jobtracker.taskScheduler to its class name org.apache.hadoop.mapred.CapacityTaskScheduler
Capacity Scheduler uses a separate file called capacity-scheduerl.xml which also lives in the Hadoop configuration directory

Capacity – Scheduler .xml

<property>
<name> mapred.capacity - scheduler.queue.default.guarateed- cpacity</name>
<value>100</value>
<descriptin> percentage of the number of slots in the cluster that are guaranteed to be availbe for jobs in this queue.
</description>
</property>