How to Install HIVE with MySQL on Ubuntu/Linux in Hadoop

Apache Hive is a data warehouse system mostly used for data summarization for structured data type files. Hive is a one of the component of Hadoop built on top of HDFS and is a data ware house kind of system in Hadoop. It is used in Tabular form(Structured data) not for FLAT files.

Step:1 Download the hive-1.2.2 tar ball from Apache Mirrors official website

Step 2: Extract the tar ball file in your path using below command:

tar-xzvf Apache-hive-1.2.2-bin.tar.gz

Step 3:Update HIVE_HOME & PATH variables in bashrc file

export HIVE_HOME=/home/sreekanth/Big_Data/Apache-hive-1.2.1-bin

export PATH=$PATH:$HIVE_HOME/bin

After update the .bashrc file will changes then go to the next step

Step 5: To check the bashrc changes,open a new terminal and type the command


Step 6: Remove jline-0.9.94.jar file from the below path to avoid the incompatibility issues of Hive version with hadoop-2.6.0

Step 7: There are 2 types of Meta Stores we can configure in Hive to store metadata.

Internally using Derby in Hive. It is only for one user

Externally using MySQL is used multiple users. In case your conf file does not contain hive-site.xml file then

Create hive-site.xml  file

Step 8: Configure hive-site.xml  file with MySQL configuration and add the below content:

Step 9: For External Meta Store ‘MySQL’ , we need MySQL connector jar file

Step 10: MySQL connector jar file into $HIVE_HOME/lib path

Step 11: Run hive command in terminal but it will showing connection refused

Due to daemons are not working so it is necessary to start all daemons other wise hive is not working

Step 11: First start all daemons using command

Step 12: Now successfully run the hive in your machine

Step 13: How to Check Hive version using below command:

hive –version

Why we use HIVE?

Because data summarization or querying tabular data in Hadoop system. Default hive database Derby  it is only for ine user. Mostly MySQL used for large data and multiple users.

How to Install Hadoop in Ubuntu/Linux in Single Node Cluster

Now a days most emerging technology Hadoop. Is a solution for Big data to store and processing large amount of data. For storage purpose HDFS and Processing in Map Reduce but now a days Map Reduce are not used. Will move to Apache Spark for processing and 100% better than Map Reduce because it is based on c

Step 1: First step we need to update the “System Software Repositories” using below command:

sudo apt-get update

Step 2: Next will Install java-1.8 version using below command.

sudo apt-get install openjdk-8-jdk

Step 3: After that check Java Version using below command:

java -version

Step 4: We must and should Install ssh using below command

sudo apt-get install ssh

Password Less SSH Communication, enter the below commands at any terminal:

ssh localhost

ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa
cat ~/.ssh/ >> ~/.ssh/authorized_keys


STEP 5: Download Hadoop-2.6.0 version tar ball from Apache Mirrors from Apache official website

STEP 6: Extract the copied tar ball using below command:

tar -xzvf hadoop-2.6.0.tar.gz

Below are the Total Configuration files in ‘Hadoop’ directory

STEP 7: We must and should to do edit the below 8 configuration files as part of HADOOP Installation:
1. core-site.xml

2. mapred-site.xml


4. yarn-site.xml

5. hdfs-site.xml



8. slaves


STEP 8: Open  core-site.xml file, add the below  properties

STEP 9: Open “” file and update JAVA_HOME path


STEP 10: Open and update JAVA_HOME

STEP 11: Open hdfs-site.xml  file and add the below properties:

STEP 12: Open mapred-site.xml and update the framework architecture details as “yarn”

STEP 13: Open and update JAVA_HOME path in that file

STEP 14: Open yarn-site.xml and add the below properties to configure “Resource Manager”.

STEP 15: Open slaves file and to check whether the hostname is localhost or not

STEP 16: Update and Set JAVA_HOME, HADOOP_HOME & PATH variables:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

export HADOOP_HOME=/home/gopalkrishna/INSTALL/hadoop-2.6.0


STEP 17: To check the bashrc changes, open a new terminal and type the below command:


STEP 18: Before starting Name Node, we must have to format the name node using below command:

hadoop namenode -format

STEP 19: To start all the daemons of hadoop in 2.X.X use “” command

Step 20: How to check Name node, Node manager, Data node running or not will use below command:

simply using:  jps

STEP 21: To Access the Name Node information in GUI using below link in your systerm


STEP 22: To Start Job History Server in Hadoop Cluster using below command start historyserver

STEP 23: To Access Resource Manager in Hadoop cluster:


STEP 24: To Access Job History Server in Hadoop Cluster


STEP 25: To stop all the daemons of hadoop in 2.X.X use “” command

STEP 26: To Stop Job History Server in 2.x.x. stop historyserve

How to Install Scala on Hadoop in Linux

Nowadays most familiar functional programming language is Scala. Scala likes a Java but little bit different. When Apache Spark enter into a picture SCALA is most scalable. Here some steps for Scala installation

Step 1: Download the Scala tarball from scala official website in your machine.

After downloading tarball will put into your Hadoop related path then will follow below step
Step 2: Extract the tar ball using below command:

tar -xzvf scala-2.11.8.tgz for extract the scala tarball

Get Scala file check whether files are there or not. Will go next step

Step 4: Update the SCALA_HOME & PATH variable in bashrc file

After an update, the SCALA_HOME and PATH will automatically environment variables are taken by the .bashrc file

Step 5: After bashrc changes, open a new terminal and check the bashrc changes using ‘ echo $SCALA_HOME  ‘ command

Open a new terminal and check above command whether scala home is updated or not

Step 6: After that Check Scala version    

How to Install Kafka on Ubuntu/Linux in Hadoop

Apache Kafka is a open source stream-processing software application developed by Apache Foundation. Here simple steps for Installation in Ubuntu\Linux operating system on Hadoop Eco system

Step 1: First step Download the Kafka tar ball from Apache Mirrors from apache official website

Place If we need a specific hadoop directory  create and copy the Kafka tar ball into that directory.

Step 2: After download the kafka tar ball Extract the tar ball using below command:

tar -xzvf kafka_2.10-

After extracting Apache Kafka you got Kafka Folder(Directory) including lib files and configuration files.

Step 3: After extraction Kafka tar ball we see Kafka directory 

Will need Apache kafka update the kafka home and path variables in .bashrc file follow below step simply.

Step 4: Update the KAFKA_HOME & PATH variable in bashrc file:

Step 5: After bashrc changes, open a new terminal and check the bashrc changes using ‘ echo $KAFKA_HOME  ‘ command

Apache Kafka majorly messaging system like Producers are process that publish data into Kafka topics to Consumers with the brokers. Consumer of  topics pulls the message off a topic.

After completed of Kafka Installation successfully will go to start and stop Kafka broker using below simple commands


Apache Kafka home directory and execute the command:


How to stop Kafka broker through the below command :



How to Setup Cloudera Multi Node Cluster Setup with Pictures

Cloudera Installation and Configure Multi Node Cluster

  1. Open Putty:

2. Type Your Machine IP address and then click on Open

3.Then Login as per Username & Password:

4. Type :  vi/etc/hosts then add remaining hosts

5. Edit: vi/etc/sysconfig/network

6. Type: vi/etc/selinux/config

SELINUX =enforcing replaced with disabled

7.Type: setenforce 0

8.Type: yum install ntp ntpdate ntp-doc: Install ntp(Netowork Time Protocol)

9. After Installation ntp then check ntp configurations type: chkconfig ntpd on

10.Type: vi/etc/ntp.conf

11.Type : ntpq -p

12.Then start the service ntpd start

13.Then ntpq

14. Then rsa pub key generator ssh-keygen-t rsa in remaining machines

15. File save as id_rsa /root/.ssh

17.ll -check whether is there or not>authorized_keys

19.Type: scp authorized_keys root@machine@.localdomain:/root/.ssh

20.Then type : yum install openssl python perl
21. yum clean all
22.yum repolist

23.Then download Clouder Manager using below command


24.chmod 700 cloudera-manager-installer.bin

25.Then type ./cloudera-manager-insatller.bin click on Next


26. After that Accept License

27.It will take automatically installing JDK

28. Automatically Insatlling Embedded Database

29.Cloudera manager server Insatlling

30. Installation Successfully

31.Click on “OK”

32. If you get any Error then you have disabled Firewalls and IP tables
33. Disabled firewall Type: systemctl disable firewalld

34. Disabled IPV6 Type: vi /etc/sysctl.conf

35. Browse your Machine

36.Login : Username: admin


37.Yes, I accept the UserLicense ” Terms and Conditions”

38. Select Cloudera Express “Free”

39.Then Search host machines using as per domain names

40. Select Repository

41. If you need any Proxy Settings then select and fill it. Don’t need leave it.

42.Click on Continue for Three machines cluster Installations. Is there any issue then choose Mozila FireFox .

43. Click on “Continue” check CDH version

44. 100% completed then click on “Continue”

45.After “Continue” then check Validations

46. Here mainly two validations are showing warnings then type below commands then Run Again

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
sysctl vm.swappiness=10

47. Click on “Finish”

48.It shown Version Summary

49.HDFS NameNode and ResourceManager must be different

50.Select “Core with Spark” then Continue


51.Click on “Test Connection”  when using embedded  Database

52.Successful Setup the Cluster.

How to Install SQOOP on Ubuntu

Apache Sqoop Installation on Ubuntu

Apache SQOOP is one of the Hadoop component. It is mainly used for data fetching from HDFS to RDBMS vice versa or bulk data between Hadoop and data stores such as relational databases.

Prerequisites :

Before you can installation of Sqoop, you have need Hadoop 2.x.x and compatible with Sqoop 1.x.x

Step 1: Download SQOOP 1.x.x tar ball from below website: p/1.4.6/

Step 2: After downloading extract the SQOOP tar ball using below  command:

tar – xzvf sqoop – 1.x.x. bin – hadoop- 2.x.x – alpha.tar. gz

Step 3: Update the bashrc file with SQOOP_HOME & PATH variables

export SQOOP_HOME=/home/slthupili/INSTALL/sqoop-1.x.x.bin-hadoop-2.x.x


Step 4: To check the bashrc changes, open a new terminal and type ‘echo $ SQOOP_HOME’


Step 5: To Integrate with MySQL Database from Hadoop Using SQOOP , we MUST have to place the respective

JAR file (mysql – connector-java5.1.38. jar) in $SQOOP _ HOME / lib path

Step 6: To check the version of SQOOP using below command:

sqoop version

Above steps are simple to installation of Sqoop on top of Hadoop in Ubuntu

To check with this video for more clarity on SQOOP Installation on Ubuntu

Sqoop to import data from a relational databases management system (RDBMS) like a  MySQL into the Hadoop Distributed File System. Sqoop automates most of this process on database to explain  about schema for the data to be imported. Sqoop uses Map Reduce to import and export the data.



How to Install Hadoop Single Node Cluster

How to Install Hadoop Single Node Cluster on Ubuntu.

Step 1: Update the “System Software Repositories” using sudo apt-get update

First step update the packages of Ubuntu

Step 2: JAVA 1.8 JRE INSTALLATION using below command.

JAVA is prerequisite for Installation so first install JRE then will go with JDK


Step 3: JAVA 1.8 JDK INSTALL using below command


Step 4: How to check JAVA version on Linux using below command

Step 5: After that We must and should Install SSH(Secure Shell) using below command:

SSH for secure less communication in name node and secondary name node for frequently communication

Step 6: Check  SSH Installation using below command

After installation of SSH will check using ssh localhost command whether communication is working or not.

Step 7: Download Hadoop-2.6.0 tarball from Apache Mirrors.

After completion of Hadoop prerequisites then download hadoop tar ball

Step 8: Extract the tar ball using below command


Step 9: Update Environment variables and Path for HADOOP_HOME and JAVA_HOME:



Step 10: To check Path variable is there or not after that edit the Configuration files as part of Hadoop Installation.



Step 11: First open “Core-site.xml” file, add the properties

Core-site file for Name node information

Step 12: Open “hdfs-site.xml” file and add the properties

Hdfs site xml file related to replication factor and data node information.

Step 13: Open “yarn-site.xml” file and add the properties to configure ‘Resource Manager’ & ‘Node Manage’ details:

Step 14: Update JAVA_HOME path in ‘’ file

Step 15:Update JAVA_HOME path in ‘‘ file

Step 16: Open ‘mapred-site.xml‘ and update the yarn into that file

Step 17: Open slaves file and check whether the hostname is localhost or not

Step 18: Before starting Name Node, we must and should format the name node using below command: hadoop namenode -format

Step 19: To start all the daemons of hadoop using below command:

Step 20: How to check daemons whether work or not using jps command

Step 21: After that all to access the Name Node information in GUI:




How to Install Cassandra on Ubuntu/Linux in Hadoop Eco-System

How to Install Apache Cassandra on Ubuntu in Hadoop Eco – System

Apache Cassandra is a open source, distributed, NoSQL database management system designed to handle huge amount of data across in cheapest servers. It provides high availability written in JAVA for data processing.

Why should we use Apache Cassandra

Now a days Cassandra is complete NoSQL database and robust deployed by some of social networks like Facebook, Twitter, and e-commerce.

1.Cassandra supports for a wide set of data structures.

2. Scalable architecture.

3. Cassandra is High reliability

4. And Cassandra supports ACID properties.

Prerequisites for Apache Cassandra Installation :

Cassandra requires a number of installations. First we need JAVA and Python 2.7 is also mandatory

Below step by step process for Cassandra installation on Ubuntu

Step 1: Download Cassandra-2.x.x-bin.tar.gz from below website:

First download tar ball from Cassandra official website

Step2:  Extract the downloaded Tarball using below command

tar-xzvf cassandra-2.x.x-bin.tar.gz

Step 3: After that Update the CASSANDRA_HOME and PATH Variables in bashrc file


Step 4: To check whether bashrc change or not, after that a open new terminal and type‘echo $CASSANDRA_HOME’ command


Step 5: After completed above settings start Cassandra Server, use below command:


Step 6: If start with Cassandra Query Language Shell using below command:


Characteristics of  Cassandra:

1.Cassandra is a cloumn – oriented database and fault tolerant

2.It is a highly consistent and Scalable.

3. Cassandra was created at Face Book  and data model is based on Google Big Data.



In this Apache Cassandra Installation tutorial simple steps to installations with pictures and basic knowledge on Cassandra






How to Install Apache Spark on Ubuntu/Linux

Apache Spark Installation

Spark is a framework and in memory data processing engine. Compare with Hadoop Map Reduce 100 times faster for data processing.  Developed in Java, Scala, Python and R languages. Now a days mostly working and execute the data in Streaming, Machine Learning.

Prerequisite of Spark Installation:

1. Update the packages on Ubuntu using

sudo apt – get update

After entering your password it will update some packages

2.  Now you can install the JDK for Java installation

sudo apt – get install default – jdk

Java version must be greater than 1.6 version

Step 1 : Download spark tar ball  from Apache spark official website

Step 2: Tar ball file into your  Hadoop directory

Step 3 : After that Extract the Downloaded tarball using below command:

tar -xzvf  spark tar ball

Step 4:  After tarball extraction , we get Spark directory and Update the SPARK_HOME & PATH variables in bashrc file

using below commands:

export SPARK_HOME=/home/slthupili/INSTALL/spark-2.x.x-bin-hadoop2.x


Step 5 : To check the bashrc changes, open a new terminal and type ‘echo $SPARK_HOME command

Step 6: After successfully Installation of Spark, Will check with Spark shell in terminal using below command :



Step 7: To check spark version and scala version using below commands:




Check with this video for any confusing while installation of Apache Spark


Above steps are very simple to installation of Apache Spark on top of Hadoop in a single node cluster.