Hive is one of the component of Hadoop built on top of Hadoop Distributed File System and is a data ware house kind of system in Hadoop. It is mainly used for data summarization and querying tabular data for advanced querying.
Impala is a n Existing query engine like Apache Hive has run high run time overhead, latency low throughput. To avoid this latency, Impala avoids Map Reduce and access the data directly using specialized distributed query engine similar to RDBMS.
Hive Vs Impala:
1. UDFs : In Hive User Defined Functions are support but Impala doesn’t support User Defined Functions.
2.Speed : Compare with Impala Hive speed is slow while execution.
3.Syntax : In Hive syntax like SQL (HQL) and Impala also follow same syntax.
4.Complex Types : Complex types of queries are support in Hive but Impala doesn’t support.
And some more important differences:
I) Impala can query data from HDFS or Apache Base but in real time, like Hive
II) Impala uses the same SQL syntax similar to Hive SQL, user interface and metadata.
III)Queries which can required more than one MapReduce phases when done in Hive will have a higher speed in Impala.