[Resolved] Memory limit exceeded in the Spark and Impala jobs in Cloudera

In this article, we will explain how to resolve the Memory limit exceeded in the Spark and Impala jobs in Cloudera Big data distribution.



ERROR:

Memory limit exceeded: Could not free memory by spilling to disk: spilling was disabled by planner. Re-enable spilling by setting the query option DISABLE_UNSAFE_SPILLS=false Error occurred on backend hostname:22000 by fragment xxxx:xxxx Memory left in process limit.

Solution :

Step 1 : Take the query_id from user or developer

Step 2: Open the Cloudera manager with Admin privileges from Cloudera distribution.

Step 3 : Click on the Impala service

Step 4: And then click on Impala Queries tab

Step 5: You’ve to select the time interval like 30 min or 2 hrs.

Step 6: Then give the query_id in the Impala Queries tab, it’s take sometime

Step 7: In the dashboard it showing Query related info. In the right corner click on the Query_details.

Step 8: See the entire query_details, it showing “Missing stats compute”.

Step 9: Then ask to user or developer to update the Statistics compute in the code level.

Step 10 : After that re-run the Impala queries from HUE or command level prompt.



Summary:

The above resolution to resolve the Memory_limit exceed error in the Impala queries. Currently, developers or coders are using most of the case  Impala service due to MPP (Massive Parallel Processing). For Hive eService not much because it is very slow and taking more time complexity. In CDP (Cloudera Data Platform)  Impala service is very fast rather than other service for data processing. Now, days data is increases exponentially, so need to increase the Data nodes along with Performance tunning for query level and need to add the some of memory parameters. Day to day life has been changed, need to more update on memory related for HDFS(Hadoop Distributed File System) storage purpose either on-premises or Cloud systems like AWS(Amazon Web Services) / Microsoft Azure / GCP (Google Cloud Platform).