How to Install Hadoop in Ubuntu/Linux in Single Node Cluster

Nowadays most emerging technology Hadoop. Is a solution for Big data to store and process a large amount of data. For storage purpose HDFS and Processing in Map Reduce but nowadays Map Reduce is not used. Will move to Apache Spark for processing and 100% better than Map Reduce because it is based on c

Step 1: First step we need to update the “System Software Repositories” using below command:
sudo apt-get update

Step 2: Next will Install java-1.8 version using below command.

sudo apt-get install openjdk-8-jdk

Step 3: After that check Java Version using below command:

java -version

Step 4: We must and should Install ssh using below command

sudo apt-get install ssh

Password Less SSH Communication, enter the below commands at any terminal:

ssh localhost

ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

 

STEP 5: Download Hadoop-2.6.0 version tar ball from Apache Mirrors from Apache official website

STEP 6: Extract the copied tar ball using below command:

tar -xzvf hadoop-2.6.0.tar.gz

Below are the Total Configuration files in ‘Hadoop’ directory

STEP 7: We must and should to do edit the below 8 configuration files as part of HADOOP Installation:
1. core-site.xml

2. mapred-site.xml

3. mapred-env.sh

4. yarn-site.xml

5. hdfs-site.xml

6. hadoop-env.sh

7. yarn-env.sh

8. slaves

 

STEP 8: Open  core-site.xml file, add the below  properties

STEP 9: Open “hadoop-env.sh” file and update JAVA_HOME path

 

STEP 10: Open mapred-env.sh and update JAVA_HOME

STEP 11: Open hdfs-site.xml  file and add the below properties:

STEP 12: Open mapred-site.xml and update the framework architecture details as “yarn”

STEP 13: Open yarn-env.sh and update JAVA_HOME path in that file

STEP 14: Open yarn-site.xml and add the below properties to configure “Resource Manager”.

STEP 15: Open slaves file and to check whether the hostname is localhost or not

STEP 16: Update and Set JAVA_HOME, HADOOP_HOME & PATH variables:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

export HADOOP_HOME=/home/gopalkrishna/INSTALL/hadoop-2.6.0

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin

STEP 17: To check the bashrc changes, open a new terminal and type the below command:

echo $HADOOP_HOME

STEP 18: Before starting Name Node, we must have to format the name node using below command:

hadoop namenode -format

STEP 19: To start all the daemons of hadoop in 2.X.X use “start-all.sh” command

Step 20: How to check Name node, Node manager, Data node running or not will use below command:

simply using:  jps

STEP 21: To Access the Name Node information in GUI using below link in your systerm

http://localhost:50070

STEP 22: To Start Job History Server in Hadoop Cluster using below command

mr-jobhistory-daemon.sh start historyserver

STEP 23: To Access Resource Manager in Hadoop cluster:

localhost:8088

STEP 24: To Access Job History Server in Hadoop Cluster

localhost:19888

STEP 25: To stop all the daemons of hadoop in 2.X.X use “stop-all.sh” command

STEP 26: To Stop Job History Server in 2.x.x.

mr-jobhistory-daemon.sh stop historyserve

How to Install Scala on Hadoop in Linux

Nowadays most familiar functional programming language is Scala. Scala likes a Java but little bit different. When Apache Spark enter into a picture SCALA is most scalable. Here some steps for Scala installation

Step 1: Download the Scala tarball from scala official website in your machine.

After downloading tarball will put into your Hadoop related path then will follow below step
Step 2: Extract the tar ball using below command:

tar -xzvf scala-2.11.8.tgz for extract the scala tarball

Get Scala file check whether files are there or not. Will go next step

Step 4: Update the SCALA_HOME & PATH variable in bashrc file

After an update, the SCALA_HOME and PATH will automatically environment variables are taken by the .bashrc file

Step 5: After bashrc changes, open a new terminal and check the bashrc changes using ‘ echo $SCALA_HOME  ‘ command

Open a new terminal and check above command whether scala home is updated or not

Step 6: After that Check Scala version    

How to Install Kafka on Ubuntu/Linux in Hadoop

Apache Kafka is a open source stream-processing software application developed by Apache Foundation. Here simple steps for Installation in Ubuntu\Linux operating system on Hadoop Eco system

Step 1: First step Download the Kafka tar ball from Apache Mirrors from apache official website

http://apache.mesi.com.ar/kafka/0.10.1.1/

Place If we need a specific hadoop directory  create and copy the Kafka tar ball into that directory.

Step 2: After download the kafka tar ball Extract the tar ball using below command:

tar -xzvf kafka_2.10-0.10.2.1.tgz

After extracting Apache Kafka you got Kafka Folder(Directory) including lib files and configuration files.

Step 3: After extraction Kafka tar ball we see Kafka directory 

Will need Apache kafka update the kafka home and path variables in .bashrc file follow below step simply.

Step 4: Update the KAFKA_HOME & PATH variable in bashrc file:

Step 5: After bashrc changes, open a new terminal and check the bashrc changes using ‘ echo $KAFKA_HOME  ‘ command

Apache Kafka majorly messaging system like Producers are process that publish data into Kafka topics to Consumers with the brokers. Consumer of  topics pulls the message off a topic.

After completed of Kafka Installation successfully will go to start and stop Kafka broker using below simple commands

Goto

Apache Kafka home directory and execute the command:

./bin/kafka-server-start.sh

How to stop Kafka broker through the below command :

./bin/kafkaserver-stop.sh

How to Setup Cloudera Multi Node Cluster Setup with Pictures

Cloudera Installation and Configure Multi Node Cluster



  1. Open Putty:

2. Type Your Machine IP address and then click on Open

3.Then Login as per Username & Password:

4. Type: vi/etc/hosts then add remaining hosts

5. Edit: vi/etc/sysconfig/network

6. Type: vi/etc/selinux/config

SELINUX =enforcing replaced with disabled

7.Type: setenforce 0

8.Type: yum install ntp ntpdate ntp-doc: Install ntp(Netowork Time Protocol)

9. After Installation ntp then check ntp configurations type: chkconfig ntpd on

10.Type: vi/etc/ntp.conf

11.Type : ntpq -p

12.Then start the service ntpd start

13.Then ntpq

14. Then rsa pub key generator ssh-keygen-t rsa in remaining machines

15. File save as id_rsa

16.cd /root/.ssh

17.ll -check whether id_rsa.pub is there or not

18.cat id_rsa.pub>authorized_keys

19.Type: scp authorized_keys root@machine@.localdomain:/root/.ssh

20.Then type : yum install openssl python perl
21. yum clean all
22.yum repolist

23. Then download Clouder Manager using below command

wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

24.chmod 700 cloudera-manager-installer.bin






25.Then type ./cloudera-manager-insatller.bin click on Next

 

26. After that Accept License

27. It will take automatically installing JDK

28. Automatically Installing Embedded Database

29. Cloudera manager server Installing

30. Installation Successfully

31.Click on “OK”

32. If you get any Error then you have disabled Firewalls and IP tables
33. Disabled firewall Type: systemctl disable firewalld

34. Disabled IPV6 Type: vi /etc/sysctl.conf

35. Browse your Machine IP:xxx.xxx.xx.xxx:7180

36.Login : Username: admin

Password: admin

37. Yes, I accept the UserLicense ” Terms and Conditions”

38. Select Cloudera Express “Free”

39.Then Search host machines using as per domain names

40. Select Repository

41. If you need any Proxy Settings then select and fill it. Don’t need leave it.

42.Click on Continue for Three machines cluster Installations. Is there any issue then choose Mozila FireFox .

43. Click on “Continue” check CDH version

44. 100% completed then click on “Continue”

45.After “Continue” then check Validations

46. Here mainly two validations are showing warnings then type below commands then Run Again

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
sysctl vm.swappiness=10

47. Click on “Finish”

48.It shown Version Summary

49.HDFS NameNode and ResourceManager must be different


50.Select “Core with Spark” then Continue

 




51. Click on “Test Connection”  when using embedded  Database

52.Successful Setup the Cluster.

How to Install SQOOP on Ubuntu

Apache Sqoop Installation on Ubuntu

Apache SQOOP is one of the Hadoop components. It is mainly used for data fetching from HDFS to RDBMS vice versa or bulk data between Hadoop and data stores such as relational databases.

Prerequisites :

Before you can installation of Sqoop, you have to need Hadoop 2.x.x and compatible with Sqoop 1.x.x

Step 1: Download SQOOP 1.x.x tar ball from below website:

http://redrockdigimark.com/apachemirror/sqoo p/1.4.6/

Step 2: After downloading extract the SQOOP tar ball using below  command:

tar – xzvf sqoop – 1.x.x. bin – hadoop- 2.x.x – alpha.tar. gz

Step 3: Update the bashrc file with SQOOP_HOME & PATH variables

export SQOOP_HOME=/home/slthupili/INSTALL/sqoop-1.x.x.bin-hadoop-2.x.x

PATH=$PATH:$SQOOP_HOME/bin

Step 4: To check the bashrc changes, open a new terminal and type ‘echo $ SQOOP_HOME’

 

Step 5: To Integrate with MySQL Database from Hadoop Using SQOOP, we MUST have to place the respective

JAR file (mysql – connector-java5.1.38. jar) in $SQOOP _ HOME / lib path

Step 6: To check the version of SQOOP using below command:

sqoop version

Above steps are simple to the installation of Sqoop on top of Hadoop in Ubuntu

To check with this video for more clarity on SQOOP Installation on Ubuntu

Sqoop to import data from a relational databases management system (RDBMS) like a  MySQL into the Hadoop Distributed File System. Sqoop automates most of this process on the database to explain about schema for the data to be imported. Sqoop uses Map Reduce to import and export the data.

How to Install Hadoop Single Node Cluster

How to Install Hadoop Single Node Cluster on Ubuntu.

Step 1: Update the “System Software Repositories” using sudo apt-get update

First step update the packages of Ubuntu

Step 2: JAVA 1.8 JRE INSTALLATION using below command.

JAVA is prerequisite for Installation so first install JRE then will go with JDK

 

Step 3: JAVA 1.8 JDK INSTALL using below command

 

Step 4: How to check JAVA version on Linux using below command

Step 5: After that We must and should Install SSH(Secure Shell) using below command:

SSH for secure less communication in name node and secondary name node for frequently communication

Step 6: Check  SSH Installation using below command

After installation of SSH will check using ssh localhost command whether the communication is working or not.

Step 7: Download Hadoop-2.6.0 tarball from Apache Mirrors.

After completion of Hadoop prerequisites then download the hadoop tarball

Step 8: Extract the tar ball using below command

 

Step 9: Update Environment variables and Path for HADOOP_HOME and JAVA_HOME:

 

 

Step 10: To check Path variable is there or not after that edit the Configuration files as part of Hadoop Installation.

 

 

Step 11: First open “Core-site.xml” file, add the properties

Core-site file for Name node information

Step 12: Open “hdfs-site.xml” file and add the properties

Hdfs site xml file related to replication factor and data node information.

Step 13: Open “yarn-site.xml” file and add the properties to configure ‘Resource Manager’ & ‘Node Manage’ details:

Step 14: Update JAVA_HOME path in ‘ hadoop-env.sh’ file

Step 15:Update JAVA_HOME path in ‘ mapred-env.sh‘ file

Step 16: Open ‘mapred-site.xml‘ and update the yarn into that file

Step 17: Open slaves file and check whether the hostname is localhost or not

Step 18: Before starting Name Node, we must and should format the name node using below command: hadoop namenode -format

Step 19: To start all the daemons of hadoop using below command:

start-all.sh

Step 20: How to check daemons whether work or not using jps command


Step 21: After that all to access the Name Node information in GUI:

http://localhost:50070

How to Install Cassandra on Ubuntu/Linux in Hadoop Eco-System

How to Install Apache Cassandra on Ubuntu in Hadoop Eco – System

Apache Cassandra is a open source, distributed, NoSQL database management system designed to handle huge amount of data across in cheapest servers. It provides high availability written in JAVA for data processing.

Why should we use Apache Cassandra

Now a days Cassandra is complete NoSQL database and robust deployed by some of social networks like Facebook, Twitter, and e-commerce.

1.Cassandra supports for a wide set of data structures.

2. Scalable architecture.

3. Cassandra is High reliability

4. And Cassandra supports ACID properties.

Prerequisites for Apache Cassandra Installation :

Cassandra requires a number of installations. First we need JAVA and Python 2.7 is also mandatory

Below step by step process for Cassandra installation on Ubuntu

Step 1: Download Cassandra-2.x.x-bin.tar.gz from below website: http://cassandra.apache.org/download

First download tar ball from Cassandra official website

Step2:  Extract the downloaded Tarball using below command

tar-xzvf cassandra-2.x.x-bin.tar.gz

Step 3: After that Update the CASSANDRA_HOME and PATH Variables in bashrc file

CASSANDRA_HOME=/home/slthupili/INSTALL/dsc-cassandra-2.2.8
export PATH=$PATH:$CASSANDRA_HOME/bin

Step 4: To check whether bashrc change or not, after that a open new terminal and type‘echo $CASSANDRA_HOME’ command

 

Step 5: After completed above settings start Cassandra Server, use below command:

cassandra

Step 6: If start with Cassandra Query Language Shell using below command:

cqlsh

Characteristics of  Cassandra:

1.Cassandra is a cloumn – oriented database and fault tolerant

2. It is highly consistent and Scalable.

3. Cassandra was created at FaceBook and data model is based on Google Big Data.

 

Summary:

In this Apache Cassandra Installation tutorial simple steps to installations with pictures and basic knowledge on Cassandra

How to Install Apache Spark on Ubuntu/Linux

Apache Spark Installation

Spark is a framework and in-memory data processing engine. Compare with Hadoop Map Reduce 100 times faster for data processing.  Developed in Java, Scala, Python and R languages. Nowadays mostly working and execute the data in Streaming, Machine Learning.

Prerequisite of Spark Installation:

1. Update the packages on Ubuntu using

sudo apt – get update

After entering your password it will update some packages

2.  Now you can install the JDK for Java installation

sudo apt – get install default – jdk

Java version must be greater than 1.6 version

Step 1 : Download spark tar ball  from Apache spark official website

Step 2: Tar ball file into your  Hadoop directory

Step 3 : After that Extract the Downloaded tarball using below command:

tar -xzvf  spark tar ball

Step 4:  After tarball extraction , we get Spark directory and Update the SPARK_HOME & PATH variables in bashrc file

using below commands:

export SPARK_HOME=/home/slthupili/INSTALL/spark-2.x.x-bin-hadoop2.x

export PATH=$PATH:$SPARK_HOME/bin

Step 5 : To check the bashrc changes, open a new terminal and type ‘echo $SPARK_HOME command

Step 6: After successfully Installation of Spark, Will check with Spark shell in terminal using below command :

Spark-shell

 

Step 7: To check spark version and scala version using below commands:

Scala>spark.version

Scala>sc.version

 

Check with this video for any confusing while installation of Apache Spark

 

Above steps are very simple to the installation of Apache Spark on top of Hadoop in a single node cluster.