ClassNotFoundException(MySQL) in Jupyter Notebook on AWS

After setting up the Hadoop cluster in the Big Data environment. I am trying to MySQL in Jupyter Notebook on AWS EMR. While connecting to MySQL databases in Amazon Relation Database Services(RDS) from Amazon Elastic Map Reduce (EMR) Jupyter Notebook, I am getting the ClassNotFoundException driver error. Here is the full error:

ClassNotFoundException: com.mysql.jdbc.Driver Error:

at org.apache.spark.SparkException: Job aborted due to storage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:  Lost task 0.3 in stage 0.0 (TID 3,  compute.internal, executor 2): java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

Solution :

The above error belongs to MySQL JDBC problem on Jupyter notebook in Amazon Elastic Map Reduce so here we provided a simple solution for that. First I checked the driver class but I am unable to find driver class when you are running it from Jupyter Notebook.

Step 1: Download the  MySQL connector java jar file from the Maven repository.


Step 2: Copy the jar file into the $SPARK_HOME/jar path in the Amazon EMR.

Summary: In Big Data environment, while connecting MySQL on Jupyter notebook in Amazon Elastic Map Reduce getting MySql connecting driver not found an error. In this article, we will explain how to resolve this type of issue with simple steps for Spark/Hadoop developer on top of Amazon Web Services. First, we need to download the exact version MySQL connector jar file and add that into the Spark_Home jar file path. After that restart the MySQL server. In case if you are getting the same error on Amazon also try to stop all services. After that start all services on Amazon web services. Here all services mean that Hadoop related services like Spark, Hive, Sqoop, HBase, and etc. In the case of an issue is still arriving check the jar file version and Jupyter versions sometime it may cause compatible issues in the Hadoop cluster on top of Amazon services.