[Resolved]Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space in | Big Data | Hadoop

In this article, we will explain how to resolve the java.lang.OutOfMemoryError: Java heap space  issue in the Big Data environment.



Error:

Exception message 'ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '2' error message: 'Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 6, vertexId=vertex_112325028493241_01108_02_03, diagnostics=[Task failed, taskId=task_0811123345345534_00659, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)

Solution:

The above error belongs to memory related while running Spark or Hadoop jobs on the cluster with huge volume data , it might be monthly, yearly jobs. Basically, these jobs consuming more container in the Yarn (Yet Another Resource Negotiator) cluster level. Then how will fix the this type of issue.

Step 1 : First, we need to re-run application. Kindly monitor at the moment whether the jobs is running properly or not.

Step 2: If still the application is getting failed, please try with below hive parameters and re-run the application:

set hive.exec.dynamic.partition = true

set hive.exec.dynamic.partition.mode = nonstrict

Once you re-run the application with above configurations, if you’re getting these type of error in the end of the log file:

FAILED : Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask




Incase the jobs are running on ADF (Azure Data Factory) pipelines try with below parameters:

hive.tez.container.size=6820MB

hive.tez.java.opts=1000MB

If you’re increase container size and java heap memory in the cluster level, it will reduce the space and time complexity of the particular job on the Big Data environment or Azure HDInsights for Spark or Hadoop  Developers.

These type of error in different scenarios, in different tools for example Talend, Informatica, etc tolls  with different java memory parameters. Sometimes, these scenarios are in Cloud related parameters we need to pass it on that. In Azure, AWS (Amazon Web Services) and GCP (Google Cloud Platform) need to configure on it otherwise, we are getting these type of error for long running jobs or Cron jobs. Please let us know if you find any resolutions kindly comment in the comment box.