Unable to Integrate Hive with Spark and different resolutions




How to integrate (connect) Hive and Spark:

Here are to provide solutions for how to integrate (connect) Hive DB with Spark in Hadoop development.
The first time, we tried to connect the Hive and Spark then we got below error and find different types of resolutions with different modes.

caused by: org.datanucleus.exceptions. NucleusExcepiton: Attempt tp invoke 
the ONECP" plugin to create a ConnectionPool gave an error: The specified 
data driver ("co.mysql.jdbc.Driver) was not found in the CLASSPATH. Please 
change our CLASSPATH specification and the name of the driver.

Different types of solution for the above error:

Resolution 1:

1.Download MySQL connector java jar file from maven official website like below link
https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.21
2. Paste the jar file into jars folder which is present in the Spark installed directory.

Resolution 2:

Without JDBC driver:

1. Goto hive-site.xml and give hive.metastore.uri in that hive xml file
2. Import the org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables then define the sqlContext param like below code:
Val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
3. Finally, verify Tables in Spark SQL

Resolution 3:





Go with the beeline for Hive and Spark connection in Hive CLI. In beeline, they provide high security and provide a remote server through directly and check with below two commands for beeline with Hive 2 server configurations.

Step 1: ./bin/beeline
Step 2:  !connect jdbc.hive2.//remote_hive:10000

Leave a Reply

Your email address will not be published. Required fields are marked *