[Resolved] Spark and Impala jobs failing continuously due to unreachable impalad

In this article, we will explain how to resolve the spark and impala jobs are failing continuously due to unreachable impalad services.


Error log 1 :

Query progress can be monitored at : https : // hostname : 25000/query_plan?

query_id = 1234fghitrir:1kfjdfk0000

ERROR : Failed due to unreachable impalad(s): cluster : 22000

ERROR log 2 :

EndDataStream() to 10.12.XX.XXX:27000 failed: Network error: Client connection negotiation failed: client connection to 10.12.XX.XXX:27000: connect: Connection refused (error 111)


Step 1 : We have logged cluster with Admin privileges

Step 2 : Check the Impala service. Impala went to down

Step 3 : Due to Impala service issue, the jobs were failed continuously with connection failed

Step 4: We have restated all Impala daemons, Impala catalog servers from Cluster level.

Step 5: It’s take some time, after Impala service is are running fine.

Step 6 : After 30 mins, the jobs are re-triggered, next follow-up the Spark and Impala jobs are running fine.

Summary : The above resolution is very simple to solve these type of errors in the Cloudera or Hortonwork clusters because of it’s Impala service issue. Connection refused in the Impala may impact for jobs. It’s not only for these jobs, it impact for all streaming jobs as well.

At current situation every jobs move to Impala, Spark and Kafka related jobs because of these are all streaming related so all the data warehouse or batch jobs are converted into these real-time streaming. The error belongs to service is unreachable or connection refused in the service or cluster level.

After restart the jobs are still getting the same error. Then put the Impala service is in MaintenanceĀ  mode andĀ  stop the Impala daemons. After sometime start the Impala service.