Unable to Integrate Hive with Spark and different resolutions




How to integrate (connect) Hive and Spark:

Here are to provide solutions for how to integrate (connect) Hive DB with Spark in Hadoop development.
The first time, we tried to connect the Hive and Spark then we got below error and find different types of resolutions with different modes.

caused by: org.datanucleus.exceptions. NucleusExcepiton: Attempt tp invoke 
the ONECP" plugin to create a ConnectionPool gave an error: The specified 
data driver ("co.mysql.jdbc.Driver) was not found in the CLASSPATH. Please 
change our CLASSPATH specification and the name of the driver.

Different types of solution for the above error:

Resolution 1:

1.Download MySQL connector java jar file from maven official website like below link
https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.21
2. Paste the jar file into jars folder which is present in the Spark installed directory.

Resolution 2:

Without JDBC driver:

1. Goto hive-site.xml and give hive.metastore.uri in that hive xml file
2. Import the org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables then define the sqlContext param like below code:
Val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
3. Finally, verify Tables in Spark SQL

Resolution 3:





Go with the beeline for Hive and Spark connection in Hive CLI. In beeline, they provide high security and provide a remote server through directly and check with below two commands for beeline with Hive 2 server configurations.

Step 1: ./bin/beeline
Step 2:  !connect jdbc.hive2.//remote_hive:10000

Hadoop job (YARN Staging) error while executing simple job

In a Hadoop eco-system, no.of jobs are executing in a fraction of time in that time. I am trying to execute the Hive job for Data validation in Hive server in Production server. While executing a Hive job in the hive command line I got this type of error.



at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
22:33:33 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging//.staging/job_1562044010976_0003/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

The above error belongs to a connection error in Datanode while executing the code. At the time Datanode not running properly. so find below resolution for this issue:

Stop all services:

stop-all.sh
start-all.sh

Here restart all services including Namenode, Secondary Namenode, DataNodes and remaining services like Hive, Spark,
etc.

If still showing this type of error then start the distributed file system.

start-dfs.sh

Check all the Hadoop Daemons like Name node, Secondary Name node, Datanode, Resource Manager and Node Manager, etc. By using below command

jps

And then check All node information by using “hadoop dfsadmin -report ” for the status of the Datanode whether it is running fine or not.

Above steps for Local, Pseudo distributed,  and standalone mode only in Hadoop eco-system.

For Cloudera, Hortonworks, MapR distributions are simply “Restart” DataNodes and Services like Hive, Spark, etc.




Summary: In Big Data environment we executing so many jobs like Hadoop/Spark/Hive for the result but some times showing above error. At the time we stuck but here the simple solution for the above error

MongoDB Error: The Program can’t start because MSVCP140.dll is missing from your computer.

Error:





The Program can’t start because MSVCP140.dll is missing from your computer. Try reinstalling the program to fix this problem in while installing MongoDB on Windows Operating System.

Resolutions:

Solution 1:

Step 1: Uninstall the MongoDB from your Windows machine.

Step 2: Clean your junk files (using CCleaner, etc) from your Windows

Step 3: Remove MongoDB all files from your system.

Step 4: Download the latest version of  MongoDB. If you need Robo 3T studio also download from the MongoDB official website.

Step 5: Trying to install the .exe file using Run as Administration. After completion of MongoDB restarts the windows machine.
After these steps error is still pending so try to follow the second solution

Solution 2:

If DLL (Dynamic Link Libraries) files are missing from your Windows machine. Some of the applications depend on the DLL files because external libraries sync up with these files to fix this issue.

Step 1:  Downloading the missing dll file from the internet and copy the file into a particular file location(C:\Windows\System32).

Step2: After Installing the missing dll file in your local machine then try to install MongoDB or other applications.

If still is not working go with below solution

Solution 3:

Step 1: Run the built-in System File checker tool for corrupted or missing files in the Windows operating system.

Step2: Try to Repair or reinstall of the MongoDB or some other application like Visual Studio.

Step 3: Then copy the DLL file from another Windows operating system and restore it on your computer and followed by re-registering the dll files in your computer.




Summary: In the Windows operating system most of the applications are not complete run the different files. If the Windows OS or software is not able to find any concerned DLL file is missing or corrupted then will receive this type of error: The Program can’t start because MSVCP140.dll is missing from your computer.

IntelliJ IDEA : Failed to load JVM DLL

I’m trying to solve this error in Windows operating system. While launching the IntelliJ IDEA for developing the code some conflicts came into the picture.



ERROR:

Failed to load JVM DLL C:\Program Files\JeeBrains\Intellij IDEA Community Edition 2019.1.1\jre64\\bin\server\jvm.dll

If you already have a 64 -bit JDK installed, define a JAVA_HOME variable in

Computer > System Properties > System Settings > Environment Variables.

But correctly defined Java Path on Windows operating system.

Resolutions:

Solution 1:

Set the JAVA_HOME path including jvm.dil path

Find below path in your local machine and copy that path into JAVA_HOME

Step1: Goto JDK path and copy the path up to jvm.dil

C:\Program Files\Java\jdk1.8.0_181\jre\bin\server

Step2: Set to JAVA_HOME in the environment variable

%JAVA_HOME%\bin

Step 3: Still, it’s not working simply remove the following below path in your System variable it may be caused to override of JAVA_HOME

C:\PrgogamData\Oracle\Java\javapath

Solution 2:

It may be sometimes a problem with Version compatibility so try to launch the 64-bit  version. Due to the 32-bit version problem on Windows 64 – bit version and create the shortcut of IDEA into your desktop.

Note: If still facing this type of issue then try to below solution.

Solution 3:

Step1: Download Latest version of JDK 1.8 and install it.

Step 2: Set the Path in user variables and JAVA_HOME in System variables with the full naming convention.

Step3: Download IntelliJ IDEA latest version with 64 – bit. And try to launch on Windows – 64-bit version

Above resolutions are almost solved your issue in IntelliJ IDEA on Windows operating system while installing of Jet Brains of IntelliJ IDEA or Eclipse IDEs are in your local machine.




Summary: In the Windows operating system An Integrated Development Environment is a major role in developing areas. All most all IDEs based on Java supporters so need to install JDK. After installation of JDK then set the environmental variable for accessing anywhere in the system.

Connection refused error while running Hive in Hadoop

When Hive Installation in a single node cluster setup on Hadoop ecosystem sometimes showing below like this:



Connection refused error in Hive

Exception in thread  “main”  java.lang.RuntimeException: call From your domain/127.0.1.1 to localhost:8020 failed on connection exception: Java.net.ConnectionException:Connection refused:

For more details see:

http://wiki.apache.org/hadoop/Conncetionrefused

at org.apache.hadoop.hive.ql.session.SesseionState.start(SessionStart.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main.(CliDriver.java:621)
at sum.reflect.NativeMethodAccessorImpl.invoke(Native Method)
...more

Caused by: java.net.ConncetException : Call From  slthupili/127.0.1.1 to localhost:8020 failed on connection exception: java.net.ConnectionException: Connection refused;

at sun.reflect.NativeConstructorAccessorImpl.newInstance0
(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImple.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
...more

Caused by : java.netConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannel Impl.java:717)

at org.apache.hadoop.net.NetUtilis.connect(NetUtils.java:530)

...more

Solution:

First, stop all services in Hadoop using below command:

$ stop-all.sh

This command used for all services like NameNode, DataNode, SecondaryNode, YARN, etc.

Second step back up the data then will use below command

$ hadoop namenode -format

Above command removes unnecessary data then enter hive command

$ hive