13/07/19 13:20:53 INFO executor.StandaloneExecutorBackend: Got assigned task 4813 13/07/19 13:20:53 INFO executor.Executor: Running task ID 4813
I see this for many outputs. A bunch of these for different tasks are interleaved with13/07/19 13:20:53 INFO executor.Executor: Its generation is -1 13/07/19 13:20:53 INFO spark.MapOutputTracker: Don't have map outputs for shuffle 1, fetching themin a block, and the block is followed by a bunch of exceptions:13/07/19 13:21:03 ERROR executor.Executor: Exception in task ID 4813 java.util.NoSuchElementException at spark.util.TimeStampedHashMap.apply(TimeStampedHashMap.scala:56) at spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:135) at spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:16) at spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:10) at spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:31) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MappedRDD.compute(MappedRDD.scala:12) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.scheduler.ResultTask.run(ResultTask.scala:77) at spark.executor.Executor$TaskRunner.run(Executor.scala:98) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)Interspersed among about 15 NoSuchElementExceptions is a single13/07/19 13:21:03 ERROR executor.Executor: Exception in task ID 4513 spark.SparkException: Error communicating with MapOutputTracker at spark.MapOutputTracker.askTracker(MapOutputTracker.scala:68) at spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:147) at spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:16) at spark.BlockStoreShuffleFetcher.fetch(BlockStoreShuffleFetcher.scala:10) at spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:31) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MappedRDD.compute(MappedRDD.scala:12) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:19) at spark.RDD.computeOrReadCheckpoint(RDD.scala:207) at spark.RDD.iterator(RDD.scala:196) at spark.scheduler.ResultTask.run(ResultTask.scala:77) at spark.executor.Executor$TaskRunner.run(Executor.scala:98) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds at akka.dispatch.DefaultPromise.ready(Future.scala:870) at akka.dispatch.DefaultPromise.result(Future.scala:874) at akka.dispatch.Await$.result(Future.scala:74) at spark.MapOutputTracker.askTracker(MapOutputTracker.scala:65) ... 20 moreThis then repeats (a block of missing shuffles, then a block of exceptions) Occasionally another exception is thrown in for good measure:13/07/19 13:21:15 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds at akka.dispatch.DefaultPromise.ready(Future.scala:870) at akka.dispatch.DefaultPromise.result(Future.scala:874) at akka.dispatch.Await$.result(Future.scala:74) at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:136) at spark.storage.BlockManagerMaster.sendHeartBeat(BlockManagerMaster.scala:39) at spark.storage.BlockManager.spark$storage$BlockManager$$heartBeat(BlockManager.scala:115) at spark.storage.BlockManager$$anonfun$initialize$1.apply$mcV$sp(BlockManager.scala:142) at akka.actor.DefaultScheduler$$anon$1.run(Scheduler.scala:142) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:94) at akka.jsr166y.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1381) at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)All this is when I split a job with around 200 partitions into around 4000.
I didn't see an answer to the original question, and I think it's the same issue - does anyone know what is going on with this, and why?
Thanks,
-Nathan Kronenfeld