Spark Connection Error Through JDBC Using MySQL

In this article, we will provide a resolution for Spark connection error via JDBC or ODBC using an external database MySQL.




While Connecting Spark and MySQL using JDBC driver getting this error: “ClassNotFoundException: com.mysql.jdbc.Driver”.
Here is a full error with a screenshot.

Spark Connection Error:

Spark_error

Scala> val jdbcDF = spark.read.jdbc("jdbc:mysql://localhost:3306/new_training", "emp", "prop")
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at scala.reflectinternal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.langClassLoader.loadClass(ClassLoader.java:425)
at java.langClassLoader.loadClass(ClassLoader.java:425)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOption$$annonfun$6.apply(JDBCOptions.scala:78)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions<init>(JDBCOptions.scala:78)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions<init>(JDBCOptions.scala:34)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationPRovider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:166)
... 52 elided

Solution:
Here we provided the simple solution for Spark, JDBC connection related issue.

Step 1: Copy “Spark_Custom.jars” folder to “$SPARK_HOME” folder




Step 2: Rename “$SPARK_HOME/conf/spark-defaults.conf.template” file to “$SPARK_HOME/conf.spark-defaults.conf”

Step 3: Add the below jars to “$SPARK_HOME/conf/spark-defaults.conf” file

spark.driver.memory 5g
spark.jars /home/sreekanth/hadoop/SPARK/spark-2.0.1-bin-hadoop2.6/Spark_Custom.jars/guava-16.0.1.jar; cassandr-driver-core-3.0.2.jar, jackson-core-2.5.2.jar; jackson-module-scala-2.1.1-bin.jar

After adding Spark and JDBC connection jar files try to connect to RDBMS, we must require the below 4 important points:
1.connection url

2.username & password

3.driver class name

4.client jar

After that check in scala prompt

scala > prop.setProperty(“driver”, “com.mysql.jdbc.Driver”)

scala> val jdbcDF = spark.read.jdbc(“jdbc:mysql://localhost:3306/new_training”, “emp”, “prop”)




In Spark-SQL getting this type of error belongs to Spark and JDBC driver connection along with MySQL server it may be compatible versions of Spark Connection jar file. Here we provided Spark custom jar files into Spark_ Home path and goto Spark Configuration path and add all connection jar files in the directory.

Summary: In the Hadoop cluster, connection errors are common, Spark JDBC connection is also very common error but we need to give Spark related connection jar files in the configuration path and meanwhile check versions. Sometimes getting version compatible related issues. In this cluster we are using MySQL is an external Database so we need to set up with Spark also through JDBC or ODBC driver. Most of the time in Spark-SQL using only JDBC driver for external database connections. In Big Data environment Spark is one of the fastest data processing frameworks for large data sets.