How to integrate (connect) Hive and Spark:
Here are to provide solutions for how to integrate (connect) Hive DB with Spark in Hadoop development.
The first time, we tried to connect the Hive and Spark then we got below error and find different types of resolutions with different modes.
caused by: org.datanucleus.exceptions. NucleusExcepiton: Attempt tp invoke the ONECP" plugin to create a ConnectionPool gave an error: The specified data driver ("co.mysql.jdbc.Driver) was not found in the CLASSPATH. Please change our CLASSPATH specification and the name of the driver.
Different types of solution for the above error:
1.Download MySQL connector java jar file from maven official website like below link https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.21 2. Paste the jar file into jars folder which is present in the Spark installed directory.
Without JDBC driver:
1. Goto hive-site.xml and give hive.metastore.uri in that hive xml file 2. Import the org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables then define the sqlContext param like below code: Val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 3. Finally, verify Tables in Spark SQL
Go with the beeline for Hive and Spark connection in Hive CLI. In beeline, they provide high security and provide a remote server through directly and check with below two commands for beeline with Hive 2 server configurations.
Step 1: ./bin/beeline Step 2: !connect jdbc.hive2.//remote_hive:10000