Planning a Cluster in Hadoop Distributions | Cloudera | Hortonworks | MapR

This article will explain about planning a cluster in Hadoop distribution in Cloudera, Hrotonworks and MapR distributions. Difference between the three distributions with simple points.

Hadoop distributions:

Cloudera CDH :

  • Basically, Cloudera is free and open-source under the Apache 2.0 license.
  • Appeals well to the end-user because aligned version numbers and critical bug fixes backported to older versions
  • It includes Namenode High Availability, Federation, supports both MapReduce version1 and MapReduce version 2
  • Provides core Hadoop and most of the popular eco-systems components compatible and tested with each other in one distribution.
  • Cloudera also distributes, Cloudera Manager a web-based management tool to provision, configure, and monitor your Hadoop cluster. Cloudera manager comes in both free and paid enterprise versions.

Hortonworks Data Platform (HDP):

  • Closely to Cloudera, Hortonworks provides a pre-packaged distribution of core and eco-system Hadoop related projects, as well as enterprise support.
  • Hortonworks Data Platform (HDP) includes HCatlog (Hadoop Catalog) – a service that provides an integration point for projects like Pig script and Hive(SQL).

  • Hortonworks Data Platform (HDP) includes an ODBC(Open Database Connectivity) driver for Hive, which is claimed to be compatible with most existing BI (Business Intelligence) tools.
  • The uniqueHortonworks Data Platform (HDP ) characteristic is its availability on the Windows platform. The bringing¬† Hadoop to the windows world will have a big impact on the platform’s assumption rate an can make HDP a leading distributor for the Windows operating system.
  • HDP includes Apache Ambari, which is a web-based tool, similar to Cloudera Manager, but is 100 percent free and open source with no distinction free and enterprise versions.

Note: Cloudera and Hortonworks merged now, products will come up with Cloudera distributions only.

MapR :

  • MapR is a company that provides a Hadoop based platform. There are serval different versions of their product like MapR3 is a free version with basic features, and MapR 5 and MapR7 are enterprise-level commercial editions.
  • The major difference of the MapR platform from Apache Hadoop is that instead of HDFS (Hadoop Distributed File System), different filesystem called MapR File System (MFS).
  • MapR FS is implemented in C++ and provides lower computing and higher consistency access than Java-based HDFS (Hadoop Distributed File System). It is compatible with Hadoop on an API level but it’s a completely different implementation.
  • MapR – File System features include the ability to mount the Hadoop cluster as an NFS (Network File System) volume, cluster-wide snapshots, and cluster mirroring.

Note: MapR distributions recently acquired by HP