Basics for Cloudera (Hortonworks) Hadoop Administration




Basics for Cloudera (Hortonworks) Administration:

1.Introduction to Hadoop?

Hadoop is a solution for BigData (large volume of data), open-source framework for storage and processing of large data including structured, semi-structured and unstructured data with different volumes. In Hadoop storage purpose using HDFS (Hadoop Distributed File System), for processing Map-Reduce with Java, Python and R programming language.

2.Mandatory of Hadoop Ecosystem components:
Basically Hadoop eco-system including below components:

A. HDFS (Hadoop Distributed File System) - Storage
B.Map-Reduce                             - Processing
C.Hive                                   - Data Summarization
D.Sqoop                                  - Import/Export Data from RDBMS to Hadoop and Hadoop to RDBMS
E.Zookeeper                              - Coordination of Hadoop tasks
F.HBase                                  - Hadoop + Database
G.Oozie                                  - Scheduling Hadoop jobs
H.Kafka                                  - Distribute Message system
I. PIG                                   - Scripting language for processing.

 

3. Hadoop Distributed File System Concepts:
A.Master Node

B.Data Node

C.Secondary Name Node

4. MapReduce & YARN concepts

MapReduce has Job Tracker & Task Tracker in Hadoop 1.0 version

Come to YARN concepts are evolved in Hadoop 2.0 version. These are all Resource Manager, Application Manager, and Node Manager.

5. Hadoop cluster Capacity Planning:
It depends upon the project and daily data then only will proceed with capacity planning for Hadoop cluster.

6.Hadoop Installation & Prerequisites:

Hadoop Installation & Prerequisites on Ubuntu
Hadoop Installation & Prerequisites on Windows

7. Configuring Different types of schedulers like Capacity & Fair schedulers

Capacity Scheduler is a FIFO scheduler in Hadoop eco-system

Fair Scheduler is like Capacity scheduler but here two jobs parallelly processing.

8. Cloudera Installation on Single node cluster using Cloudera Manager

Cloudera Installation like Hadoop single node but here is using Cloudera Manager. By default, ClouderaManager has JDK and default DataBase

9.Cloudera Manager Upgrade Process :

Clouder Manager up gradation one of the easy process for CDH versions. Here are different features from one version to another version.

A. Collect the upgrade Information & Upgrade the services and Prerequisities.

B. After upgrade, the services then test that services with versions and compatible

10.Commissioning and Decommissioning:

A. Add the data nodes hosts in slaves, create a file “includes”.




B. Refresh the nodes and then goto $HADOOP_HOME/sbin directory, start the services. This is called the commission process.

C. Remove the data node in slaves. create a file “exclude”.

D. Refresh nodes and then run balances. This is called Decommission process.

11. Edit logs and Name Node Image File details:
About Edit logs and Name node image file information. How to update the files.

12. Fundamentals for High Availability

13.Configuring High Availability

14. Hadoop security – Securing Authentication with Kerberos

15. Hadoop Security – HDFS encryption

16. Cloudera Backup and Disaster Recovery

17. Monitor & Manage the Cloudera Hadoop Cluster.