Spark Job failing in Dataproc

UFM

unread,

Feb 22, 2018, 8:48:32 PM2/22/18

to Google Cloud Dataproc Discussions

Dear All,

While trying to execute a standard benchmark query (Q02 from Big Bench (aka TPCx-BB)) on Dataproc Spark, I get the following:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 132 in stage 4.0 failed 4 times, most recent failure: Lost task 132.3 in stage 4.0 (TID 3718, internal, executor 31): ExecutorLostFailure (executor 31 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 24.3 GB of 24 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

I am using n1-standard-16 instances (16 vCPUs, 64 GiB Mem). Does anyone here know how to fix this error? Thanks in advance!

Thanks,

-Umar

Patrick Clay

unread,

Feb 23, 2018, 4:46:41 PM2/23/18

to Google Cloud Dataproc Discussions

Switching to n1-highmems or tuning properties should fix that.

The default executor sizing for VMs (with 16 or more vCPUS) of Dataproc is 1 executor per 8 vCPUs, which comes out to ~24GB per executor on the standard track and ~42 GB per executor on the highmem track. Machines with 8 or fewer vCPUs have smaller executors. So if you want large executors by default n1-highmem-16 might help.

That being said you might want to tune your Spark job properties either increasing spark.executor.memory to have larger executors, decreasing spark.executor.cores to minimize parallelism in each executor, or look at Spark's memory tuning guide.

Hope that helps

-Patrick

UFM

unread,

Feb 26, 2018, 4:43:37 PM2/26/18

to Google Cloud Dataproc Discussions

Thanks Patrick for all the suggestions. Tuning helped (increasing both driver and executormemoryOverhead). I am now able to finish that query (and other problematic queries) on n1-standard-16 instances.