[Solved] Spark Streaming jobs failed due to Name node is in safe mode. Resources are low on NN | Big Data | Hadoop





In this post, we will explain how to resolve Spark Streaming jobs failed due to Name node is in safemode and resources are low on Name Node

org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to stage failure: Task 0 in stage 65045.0 failed 4 times, most recent failure: Lost task 0.3 in stage 645342.0 (TID 3432321, dalstreamhdfm05.innovate.lan, executor 23421): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create file/nifi/streamingapp/TouchesDataTableOffset/jdbc/state/0/0/temp. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1422)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2697) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) 
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) 
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
 at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554) at org.apache.hadoop.ipc.Client.call(Client.java:1498) at org.apache.hadoop.ipc.Client.call(Client.java:1398) 
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy18.create(Unknown Source) 
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:313)
 at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184) at com.sun.proxy.$Proxy19.create(Unknown Source) 
 at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1828) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1712)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1647) at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480)
 at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
 at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:931) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:912) 
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:808) 
 at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.tempDeltaFileStream$lzycompute(HDFSBackedStateStoreProvider.scala:91) 
 at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.tempDeltaFileStream(HDFSBackedStateStoreProvider.scala:91)
 at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.put(HDFSBackedStateStoreProvider.scala:106) 
 at org.apache.spark.sql.execution.streaming.StateStoreWriter$class.timeTakenMs(statefulOperators.scala:102) 
 at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.timeTakenMs(statefulOperators.scala:251)
 at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(statefulOperators.scala:302) 
 at org.apache.spark.sql.execution.streaming.StateStoreSaveExec$$anonfun$doExecute$3.apply(statefulOperators.scala:270) 
 at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.apply(package.scala:67)
 at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.apply(package.scala:62)
 at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:78) 
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) 
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
 at org.apache.spark.rdd.CoalescedRDD.apply(CoalescedRDD.scala:100)
 at org.apache.spark.rdd.CoalescedRDD.apply(CoalescedRDD.scala:99) 
 at scala.collection.Iterator.nextCur(Iterator.scala:434) at scala.collection.Iterator.hasNext(Iterator.scala:440)
 at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage8.processNext(Unknown Source) 
 at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:109) 
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) 
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 

Solution:

The above error showing two types of issues, first one Namenode safemode and Amabri services issues.

1.First, we need to check the Name nodeĀ  Safemode status using below command:

hadoop dfsadmin -safemode get

After that to leave safemode using below command.

hadoop dfsadmin -safemode leave

2. Second, we need to check all services in Ambari . If incase any service goes down then will go with that service issues.




Once Safemode off then restart the below nodes:
1.Namenode
2.Datanode
3.NiFi servers
After restarted all services, it’s working fine in Hortonworks distribution. I tried to restart Spark Streaming jobs, it’s running without errors.