Apache Hive is a data warehouse system mostly used for data summarization for structured data type files. Hive is a one of the component of Hadoop built on top of HDFS and is a data warehouse kind of system in Hadoop. It is used in Tabular form(Structured data) not for FLAT files.
Step:1 Download the hive-1.2.2 tarball from Apache Mirrors official website
http://apache.mirrors.tds.net/hive/hive-1.2.2
Step 2: Extract the tar ball file in your path using below command:
tar-xzvf Apache-hive-1.2.2-bin.tar.gz
Step 3:Update HIVE_HOME & PATH variables in bashrc file
export HIVE_HOME=/home/sreekanth/Big_Data/Apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
After update, the .bashrc file will change then go to the next step
Step 5: To check the bashrc changes, open a new terminal and type the command
echo $HIVE_HOME
Step 6: Remove jline-0.9.94.jar file from the below path to avoid the incompatibility issues of Hive version with hadoop-2.6.0
Step 7: There are 2 types of Meta Stores we can configure in Hive to store metadata.
Internally using Derby in Hive. It is only for one user
Externally using MySQL is used multiple users. In case your conf file does not contain hive-site.xml file then
Create hive-site.xml file
Step 8: Configure hive-site.xml file with MySQL configuration and add the below content:
Step 9: For External Meta Store ‘MySQL’ , we need MySQL connector jar file
Step 10: MySQL connector jar file into $HIVE_HOME/lib path
Step 11: Run hive command in terminal but it will showing connection refused
Due to daemons are not working so it is necessary to start all daemons other wise hive is not working
Step 11: First start all daemons using start-all.sh command
Step 12: Now successfully run the hive in your machine
Step 13: How to Check Hive version using below command:
hive –version
Why we use HIVE?
Because of data summarization or querying tabular data in the Hadoop system. Default hive database Derby it is only for one user. Mostly MySQL used for large data and multiple users.