Permission Denied error in Hive while creating Database in Hadoop eco-system

I have installed Hive service on top Hadoop eco-system then trying to create a database but I got below error and find out a solution as well.

Permission Denied Error in Hive:

FAILED: Execution Error, return code1 from org.apache.hadoop.hive.ql.exec.DDLTask
hive> set task = false:
hive> create database myhive:
FAILED: Error in metadata: MetaException(message:Got exception: Permission denied user = hadoop access = WRITE, inode*/user*: hdfs : supergroup : drwxr-rx-r
at org.apache.hadoop.hdfs.server.namenode.FSPErmissionChecker.check(
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Above error belongs to Permission issue in Hive component:


To resolve the permission issue to the user Hadoop in HDFS. Can you please follow step for the solutions is just change permission to the user using chmod commands:

Step 1: Login to as hduser then execute the below commands one by one.
Step 2: sudo - u hdfs hadoop fs -mkdir /user/hive/warehouse
Step 3: sudo -u hdfs hadoop fs -chmod g+w /tmp
Step 4: sudo -u hdfs hadoop fs -chmod g+w /user/hive/warehouse
Step 5: sudo - u hdfs hadoop fs  -chown -R /user/hive/warehouse
Step 6: sudo chmod 777 /var/lib/hive/metastore
Step 7: cd /var/lib/hive/metastore/metastore_db/
Step 8 :sudo rm *.lck

Summary: I have tried above resolutions then working fine now for above error in Hive.

Hive: SortBy Vs OrderBy Vs DistributeBy Vs ClusterBy


Hive uses the column in SortBy to sort the rows before sustaining the rows to a reducer in Hive environment. The sort order will be dependent on the column types especially for the column is of numeric type, then the sort order is also in numeric order. If the column is of string type, then the sort order will be lexicographical order in Hive Query. It orders data at each of ‘N’ reducers, but each reducer can have overlapping ranges of data in Hive.

Output: N or more sorted files with overlapping ranges.

Example Query for SortBy

SELECT key, value FROM source SORTBY key ASC, value DESC

Order By:

This is similar to ORDER BY in SQL language. In Hive, ORDER BY guarantees total ordering of data, but for that, it has to be passed on to a single reducer which is normally intolerable and therefore in inflexible mode, in hive makes it compulsory to use LIMIt with ORDER BY so that reducer doesn’t get exhausted.

Ordering: Total Order DATA.

Output: Single output i.e fully ordered.

Example Query for OrderBy

SELECT key, value FROM source ORDER BY key ASC, value DESC

Distribute By:

Apache Hive uses the columns in Distribute By to distribute the rows between reducers in a query language. All rows with the same Distribute By columns will go to the same reducer.
Distribute By protecting each of N reducers gets non-overlapping ranges of the column but doesn’t sort the output of each reducer.


In  Distribute By x on the following 5 rows to 2 reducers:


Reducer 1 got


Reducer 2 got


Cluster By:

Cluster By is a combination of both Distribute By and Sort By. CLUSTER BY x protecting each of N reducers gets non-overlapping ranges, then sorts by those ranges at the reducers.

Ordering: Global ordering between multiple reducers.

Output: N or more sorted files with non-overlapping ranges.


Refer to same example as above, if we use Cluster By x, the two reducers will further sort rows on x:

Reducer 1 got


Reducer 2 got


Connection refused error while running Hive in Hadoop

When Hive Installation in a single node cluster setup on Hadoop ecosystem sometimes showing below like this:

Connection refused error in Hive

Exception in thread  “main”  java.lang.RuntimeException: call From your domain/ to localhost:8020 failed on connection exception: refused:

For more details see:

at org.apache.hadoop.hive.ql.session.SesseionState.start(
at org.apache.hadoop.hive.cli.CliDriver.main.(
at sum.reflect.NativeMethodAccessorImpl.invoke(Native Method)

Caused by: : Call From  slthupili/ to localhost:8020 failed on connection exception: Connection refused;

at sun.reflect.NativeConstructorAccessorImpl.newInstance0
(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
at java.lang.reflect.Constructor.newInstance(

Caused by : java.netConnectException: Connection refused

at Method)





First, stop all services in Hadoop using below command:


This command used for all services like NameNode, DataNode, SecondaryNode, YARN, etc.

Second step back up the data then will use below command

$ hadoop namenode -format

Above command removes unnecessary data then enter hive command

$ hive


Hive Vs Impala

Hive :

Hive is one of the component of Hadoop built on top of Hadoop Distributed File System and is a data ware house kind of system in Hadoop. It is mainly used for data summarization and querying tabular data for advanced querying.


Impala is a n Existing query engine like Apache Hive has run high run time overhead, latency low throughput. To avoid this latency, Impala avoids Map Reduce and access the data directly using specialized distributed query engine similar to RDBMS.

Hive Vs Impala:

1. UDFs : In Hive User Defined Functions are support but Impala doesn’t support User Defined Functions.

2.Speed : Compare with Impala Hive speed is slow while execution.

3.Syntax : In Hive syntax like SQL (HQL) and Impala also follow same syntax.

4.Complex Types : Complex types of queries are support in Hive but Impala doesn’t support.

And some more important differences:

I) Impala can query data from HDFS or Apache Base but in real time, like Hive

II) Impala uses the same SQL syntax similar to Hive SQL, user interface and metadata.

III)Queries which can required more than one MapReduce phases when done in Hive will have a higher speed in Impala.

Difference between Managed and External Tables with Syntax in HIVE

Difference between Managed and External Tables with Syntax in HIVE

Apache is HIVE is mainly used for data summarization for querying language.  Hive SQL is same like as SQL but a little bit different here how data summarized and data processing through the query language.

Here explain about Apache tables difference between tables with syntax.

Hive table is logically made up of the data associated with metadata describing the layout of the data in the table

There are two types of Hive table:

1.Managed or Default or Internal Table

2.External Table

1.Managed Table:

When will create a table in Hive, the default Hive will manage the data, which means that Hive moves the data into its warehouse directory.

The default warehouse location of HDFS:

“/usr/hive/warehouse/<table name>>/<<table data>>”


Syntax to create a managed table in Hive






2.External Table:

External tables are external to the hive warehouse path. External table data will be stored in an external location of HDFS which we specify at the table schema.

Warehouse location of HDFS:

“/usr/hive/warehouse/<table name>>/<<table data>>”







Difference between Managed table and External Table:

Default tables mean that local tables on premises or within the database. External tables are external to hive warehouse system with HDFS path. In Managed tables no need to give an extra keyword for at the time of creating a table but in external tables, we need an external keyword for table creation. Mainly if the table was dropped in managed table entire data will be lost but in the external table, only metadata will be lost.

Will check with this video for how to create managed tables and load the data in the hive and go with as well as External Tables.

Meta Store in APACHE HIVE

In Hadoop eco-system Hive component processing all the structure information of the various tables and partitions in the warehouse including column, column type information to the necessary to read-write data and the corresponding HDFS files where data is stored. That is the central repository of Hive metadata.

Here mainly Metadata is the internal database of the hive which is responsible for managing metadata information.

Metadata which metastore stores contain things like:

1.Ids of database

2.Ids of Tables and index

3.Time of creation of the index

4.Time of creation of tables

5.Input format used for tables

6.Output format used for tables

Hive majorly Three modes of Metastore

1.Embedded Metastore

2.Local Metastore

3.Remote Metastore

1.Embedded Metastore:

Embedded metastore runs in the same JVM as the Hive service and contains an embedded Derby database instance backed by the local disk it is called the embedded metastore.

Default configuration for Embedded Metastore

below are the metastore configuration details in “hive-site.xml”



<value>jdbc:derby: ;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value>

<descripton>JDBC connect string for a Embedded metastore</description>





<descripton>Driver class name  for a Embedded metastore</description>



2. Local Metastore:

It supports multiple users to use a standalone database. Configuration same as local metastore but connect to a database running in a separate process.

If we use MySQL database as the local metastore, then we need some configuration in “hive-site.xml”



<value>jdbc:mysql://host/dbname?create DAtabaseIfnotExist=trueue</value>

<descripton>JDBC connect string for a JDBC metastore</description>





<descripton>Driver class name  for a JDBCmetastore</description>



3.Remote Metastore: 

In remoter metastore, where one or more metastore servers run in separate processes to Hive service.

This metastore better manageability and security.


What is Hive and Architecture of Hive

What is the HIVE?

Apache Hive is data warehousing infrastructure based on Hadoop. Hadoop provided massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware.

Hive is designed to enable data summarization, ad-hoc querying, and analysis of the large volume of data. At the same time, Hive’s SQL gives users multiple places to integrate their own functionality to do custom analysis like UDFs

Architecture of HIVE

Here CLI -Command Line Interface, JDBC- JavaDataBase Connector and Web GUI(Graphical User Interface). When the user comes with CLI  then directly connected with Drivers, the user comes with JDBC at that time by using API it connected to Hive driver. When Hive Driver receives the tasks queries from the user and sends to Hadoop architecture then architecture uses name node, data node, job tracker, task tracker for receiving data.