In this article, we will explain what is Apache Hive and Architecture with examples for the Big Data environment in the Hadoop cluster.
What is the HIVE?
Apache Hive is a data warehousing infrastructure based on Hadoop. Hadoop provided massive scale-out and fault tolerance capabilities for data storage and processing on commodity hardware.
Hive is designed to enable data summarization, ad-hoc querying, and analysis of the large volume of data. Apache Hive uses a Hive Query language like SQL. It translates the hive queries into MapReduce programs. At the same time, the Hive’s SQL gives users multiple places to integrate their functionality to do custom analysis like UDFs.
The architecture of HIVE:
Here CLI -Command Line Interface, JDBC- JavaDataBase Connector and Web GUI(Graphical User Interface). When the user comes with CLI then directly connected with Drivers, the user comes with JDBC at that time by using API it connected to Hive driver. When Hive Driver receives the task queries from the user and sends it to Hadoop architecture then architecture uses name node, data node, job tracker, task tracker for receiving data.
The basic components of the Hive are :
Here Hive Client means that JDBC(Java Database Connectivity), ODBC(Oracle Database Connectivity) drivers for performing the large querying.
Basically, the Hive Services are Beeline, Hive Server 2, etc. Beeline and Hive Server 2 is a command shell supported by Hive Server only.
Coming to the Meta store, it is also one of the Hive services to store the metadata information about the structure of tables, columns, partitions, information from large data of volume. WebUI for monitoring the data.
3.Processing and Resource Management(RM):
In the processing and Resource Management system, Hive internally uses a MapReduce framework for executing queries.
Hive installed on top Hadoop eco-system so it is Distributed storage for Big data.