This article will explain about planning a cluster in Hadoop distribution in Cloudera, Hrotonworks and MapR distributions. Difference between the three distributions with simple points.
Cloudera CDH :
- Basically, Cloudera is free and open-source under the Apache 2.0 license.
- Appeals well to the end-user because aligned version numbers and critical bug fixes backported to older versions
- It includes Namenode High Availability, Federation, supports both MapReduce version1 and MapReduce version 2
- Provides core Hadoop and most of the popular eco-systems components compatible and tested with each other in one distribution.
- Cloudera also distributes, Cloudera Manager a web-based management tool to provision, configure, and monitor your Hadoop cluster. Cloudera manager comes in both free and paid enterprise versions.
Hortonworks Data Platform (HDP):
- Closely to Cloudera, Hortonworks provides a pre-packaged distribution of core and eco-system Hadoop related projects, as well as enterprise support.
- Hortonworks Data Platform (HDP) includes HCatlog (Hadoop Catalog) – a service that provides an integration point for projects like Pig script and Hive(SQL).
- Hortonworks Data Platform (HDP) includes an ODBC(Open Database Connectivity) driver for Hive, which is claimed to be compatible with most existing BI (Business Intelligence) tools.
- The uniqueHortonworks Data Platform (HDP ) characteristic is its availability on the Windows platform. The bringing Hadoop to the windows world will have a big impact on the platform’s assumption rate an can make HDP a leading distributor for the Windows operating system.
- HDP includes Apache Ambari, which is a web-based tool, similar to Cloudera Manager, but is 100 percent free and open source with no distinction free and enterprise versions.
Note: Cloudera and Hortonworks merged now, products will come up with Cloudera distributions only.
- MapR is a company that provides a Hadoop based platform. There are serval different versions of their product like MapR3 is a free version with basic features, and MapR 5 and MapR7 are enterprise-level commercial editions.
- The major difference of the MapR platform from Apache Hadoop is that instead of HDFS (Hadoop Distributed File System), different filesystem called MapR File System (MFS).
- MapR FS is implemented in C++ and provides lower computing and higher consistency access than Java-based HDFS (Hadoop Distributed File System). It is compatible with Hadoop on an API level but it’s a completely different implementation.
- MapR – File System features include the ability to mount the Hadoop cluster as an NFS (Network File System) volume, cluster-wide snapshots, and cluster mirroring.
Note: MapR distributions recently acquired by HP