spark hang on a simple task

1,228 views
Skip to first unread message

narashy

unread,
May 30, 2013, 9:53:59 PM5/30/13
to spark...@googlegroups.com
Hi all,
  Could anyone help can figure out where the problem is? I'm launching a simple program on a mesos cluster:

import util.Random
import scala.math
import scala.util.control.Breaks._

import spark._


object TestReadFile {
  
  def main(args : Array[String]) {

val sc = new SparkContext(args(0), "TestReadFile", System.getenv("SPARK_HOME"), List("target/scala-2.9.2/smile_2.9.2-0.1.jar"))

println("before read file")  // this line is executed
val lines = sc.textFile("data.txt")
val m = lines.count
println (m)
  }

}

This standalone program works fine on local machine. However when I run it on a cluster using `run-main TestReadFile mesos://172.18.146.121:5050` and the job hangs after the first println (the second println is never executed). Here is the output. The file `data.txt` is a 6MB file.

[0m[ [0minfo [0m] [0mSet current project to smile (in build file:/home/Test/SMiLe/) [0m
[0m[ [0minfo [0m] [0mRunning TestReadFile mesos://172.18.146.121:5050 [0m
13/05/31 01:39:28 INFO Slf4jEventHandler: Slf4jEventHandler started
13/05/31 01:39:28 INFO BlockManagerMaster: Registered BlockManagerMaster Actor
13/05/31 01:39:28 INFO MemoryStore: MemoryStore started with capacity 971.5 MB.
13/05/31 01:39:28 INFO DiskStore: Created local directory at /tmp/spark-local-20130531013928-0da8
13/05/31 01:39:28 INFO ConnectionManager: Bound socket to port 47392 with id = ConnectionManagerId(gate5.corp.com,47392)
13/05/31 01:39:28 INFO BlockManagerMaster: Trying to register BlockManager
13/05/31 01:39:28 INFO BlockManagerMaster: Registered BlockManager
13/05/31 01:39:29 INFO HttpBroadcast: Broadcast server started at http://172.18.146.121:39749
13/05/31 01:39:29 INFO MapOutputTracker: Registered MapOutputTrackerActor actor
13/05/31 01:39:29 INFO HttpFileServer: HTTP File server directory is /tmp/spark-048b9194-6ba4-49bf-a77c-7917630bf5e6
13/05/31 01:39:29 INFO IoWorker: IoWorker thread 'spray-io-worker-0' started
13/05/31 01:39:29 INFO HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:50662
13/05/31 01:39:29 INFO BlockManagerUI: Started BlockManager web UI at http://gate5.corp.com:50662
13/05/31 01:39:29 INFO SparkContext: Added JAR target/scala-2.9.2/smile_2.9.2-0.1.jar at http://172.18.146.121:51256/jars/smile_2.9.2-0.1.jar with timestamp 1369964369347
13/05/31 01:39:29 INFO MesosSchedulerBackend: Registered as framework ID 2013053023002039616172-5050-28213-0010
before read file
13/05/31 01:39:29 INFO MemoryStore: ensureFreeSpace(36160) called with curMem=0, maxMem=1018712555
13/05/31 01:39:29 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 35.3 KB, free 971.5 MB)
13/05/31 01:39:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/31 01:39:29 WARN LoadSnappy: Snappy native library not loaded
13/05/31 01:39:29 INFO FileInputFormat: Total input paths to process : 1
13/05/31 01:39:29 INFO SparkContext: Starting job: count at TestReadFile.scala:18
13/05/31 01:39:29 INFO DAGScheduler: Got job 0 (count at TestReadFile.scala:18) with 2 output partitions (allowLocal=false)
13/05/31 01:39:29 INFO DAGScheduler: Final stage: Stage 0 (textFile at TestReadFile.scala:17)
13/05/31 01:39:29 INFO DAGScheduler: Parents of final stage: List()
13/05/31 01:39:29 INFO DAGScheduler: Missing parents: List()
13/05/31 01:39:29 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at textFile at TestReadFile.scala:17), which has no missing parents
13/05/31 01:39:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at textFile at TestReadFile.scala:17)
13/05/31 01:39:29 INFO ClusterScheduler: Adding task set 0.0 with 2 tasks
13/05/31 01:39:29 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 2013053023002039616172-5050-28213-3: gate8.corp.com (preferred)
13/05/31 01:39:29 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 30 ms
13/05/31 01:39:29 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2013053023002039616172-5050-28213-4: gate9.corp.com (preferred)
13/05/31 01:39:29 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:31 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-3 from TaskSet 0.0
13/05/31 01:39:31 INFO TaskSetManager: Lost TID 0 (task 0.0:0)
13/05/31 01:39:31 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-3 (generation 0)
13/05/31 01:39:31 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-3 from BlockManagerMaster.
13/05/31 01:39:31 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:31 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 1 ms
13/05/31 01:39:31 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-3 successfully in removeExecutor
13/05/31 01:39:31 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-4 from TaskSet 0.0
13/05/31 01:39:31 INFO TaskSetManager: Lost TID 1 (task 0.0:1)
13/05/31 01:39:31 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-4 (generation 1)
13/05/31 01:39:31 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-4 from BlockManagerMaster.
13/05/31 01:39:31 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-4 successfully in removeExecutor
13/05/31 01:39:31 INFO TaskSetManager: Starting task 0.0:1 as TID 3 on executor 2013053023002039616172-5050-28213-4: gate9.corp.com (preferred)
13/05/31 01:39:31 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:32 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:32 INFO TaskSetManager: Lost TID 2 (task 0.0:0)
13/05/31 01:39:32 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 2)
13/05/31 01:39:32 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:32 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:32 INFO TaskSetManager: Starting task 0.0:0 as TID 4 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:32 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:33 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:33 INFO TaskSetManager: Lost TID 4 (task 0.0:0)
13/05/31 01:39:33 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 3)
13/05/31 01:39:33 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:33 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:33 INFO TaskSetManager: Starting task 0.0:0 as TID 5 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:33 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:33 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-4 from TaskSet 0.0
13/05/31 01:39:33 INFO TaskSetManager: Lost TID 3 (task 0.0:1)
13/05/31 01:39:33 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-4 (generation 4)
13/05/31 01:39:33 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-4 from BlockManagerMaster.
13/05/31 01:39:33 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-4 successfully in removeExecutor
13/05/31 01:39:33 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on executor 2013053023002039616172-5050-28213-4: gate9.corp.com (preferred)
13/05/31 01:39:33 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:34 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:34 INFO TaskSetManager: Lost TID 5 (task 0.0:0)
13/05/31 01:39:34 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 5)
13/05/31 01:39:34 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:34 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:34 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:34 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:35 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-4 from TaskSet 0.0
13/05/31 01:39:35 INFO TaskSetManager: Lost TID 6 (task 0.0:1)
13/05/31 01:39:35 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-4 (generation 6)
13/05/31 01:39:35 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-4 from BlockManagerMaster.
13/05/31 01:39:35 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-4 successfully in removeExecutor
13/05/31 01:39:35 INFO TaskSetManager: Starting task 0.0:1 as TID 8 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:35 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:36 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:36 INFO TaskSetManager: Lost TID 7 (task 0.0:0)
13/05/31 01:39:36 INFO TaskSetManager: Lost TID 8 (task 0.0:1)
13/05/31 01:39:36 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 7)
13/05/31 01:39:36 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:36 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:36 INFO TaskSetManager: Starting task 0.0:1 as TID 9 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:36 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 1 ms
13/05/31 01:39:36 INFO TaskSetManager: Starting task 0.0:0 as TID 10 on executor 2013053023002039616172-5050-28213-3: gate8.corp.com (preferred)
13/05/31 01:39:36 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:36 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-3 from TaskSet 0.0
13/05/31 01:39:36 INFO TaskSetManager: Lost TID 10 (task 0.0:0)
13/05/31 01:39:36 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-3 (generation 8)
13/05/31 01:39:36 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-3 from BlockManagerMaster.
13/05/31 01:39:36 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-3 successfully in removeExecutor
13/05/31 01:39:36 INFO TaskSetManager: Starting task 0.0:0 as TID 11 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:36 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:37 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:37 INFO TaskSetManager: Lost TID 11 (task 0.0:0)
13/05/31 01:39:37 INFO TaskSetManager: Lost TID 9 (task 0.0:1)
13/05/31 01:39:37 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 9)
13/05/31 01:39:37 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:37 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:37 INFO TaskSetManager: Starting task 0.0:1 as TID 12 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:37 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 1 ms
13/05/31 01:39:38 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:38 INFO TaskSetManager: Lost TID 12 (task 0.0:1)
13/05/31 01:39:38 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 10)
13/05/31 01:39:38 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:38 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:38 INFO TaskSetManager: Starting task 0.0:1 as TID 13 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:38 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:38 INFO TaskSetManager: Starting task 0.0:0 as TID 14 on executor 2013053023002039616172-5050-28213-2: gate6.corp.com (preferred)
13/05/31 01:39:38 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 2 ms
13/05/31 01:39:39 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:39 INFO TaskSetManager: Lost TID 13 (task 0.0:1)
13/05/31 01:39:39 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 11)
13/05/31 01:39:39 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:39 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:39 INFO TaskSetManager: Starting task 0.0:1 as TID 15 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:39 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:39 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-2 from TaskSet 0.0
13/05/31 01:39:39 INFO TaskSetManager: Lost TID 14 (task 0.0:0)
13/05/31 01:39:39 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-2 (generation 12)
13/05/31 01:39:39 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-2 from BlockManagerMaster.
13/05/31 01:39:39 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-2 successfully in removeExecutor
13/05/31 01:39:40 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:40 INFO TaskSetManager: Lost TID 15 (task 0.0:1)
13/05/31 01:39:40 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 13)
13/05/31 01:39:40 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:40 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:40 INFO TaskSetManager: Starting task 0.0:1 as TID 16 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:40 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:41 INFO TaskSetManager: Starting task 0.0:0 as TID 17 on executor 2013053023002039616172-5050-28213-2: gate6.corp.com (preferred)
13/05/31 01:39:41 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:41 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:41 INFO TaskSetManager: Lost TID 16 (task 0.0:1)
13/05/31 01:39:41 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 14)
13/05/31 01:39:41 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:41 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:41 INFO TaskSetManager: Starting task 0.0:1 as TID 18 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:41 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 1 ms
13/05/31 01:39:41 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-2 from TaskSet 0.0
13/05/31 01:39:41 INFO TaskSetManager: Lost TID 17 (task 0.0:0)
13/05/31 01:39:41 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-2 (generation 15)
13/05/31 01:39:41 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-2 from BlockManagerMaster.
13/05/31 01:39:41 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-2 successfully in removeExecutor
13/05/31 01:39:41 INFO TaskSetManager: Starting task 0.0:0 as TID 19 on executor 2013053023002039616172-5050-28213-2: gate6.corp.com (preferred)
13/05/31 01:39:41 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:42 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:42 INFO TaskSetManager: Lost TID 18 (task 0.0:1)
13/05/31 01:39:42 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 16)
13/05/31 01:39:42 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:42 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:42 INFO TaskSetManager: Starting task 0.0:1 as TID 20 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:42 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:42 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-2 from TaskSet 0.0
13/05/31 01:39:42 INFO TaskSetManager: Lost TID 19 (task 0.0:0)
13/05/31 01:39:42 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-2 (generation 17)
13/05/31 01:39:42 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-2 from BlockManagerMaster.
13/05/31 01:39:42 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-2 successfully in removeExecutor
13/05/31 01:39:42 INFO TaskSetManager: Starting task 0.0:0 as TID 21 on executor 2013053023002039616172-5050-28213-0: gate5.corp.com (preferred)
13/05/31 01:39:42 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:43 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0
13/05/31 01:39:43 INFO TaskSetManager: Lost TID 20 (task 0.0:1)
13/05/31 01:39:43 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-1 (generation 18)
13/05/31 01:39:43 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-1 from BlockManagerMaster.
13/05/31 01:39:43 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-1 successfully in removeExecutor
13/05/31 01:39:43 INFO TaskSetManager: Starting task 0.0:1 as TID 22 on executor 2013053023002039616172-5050-28213-1: gate7.corp.com (preferred)
13/05/31 01:39:43 INFO TaskSetManager: Serialized task 0.0:1 as 1493 bytes in 0 ms
13/05/31 01:39:43 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-0 from TaskSet 0.0
13/05/31 01:39:43 INFO TaskSetManager: Lost TID 21 (task 0.0:0)
13/05/31 01:39:43 INFO DAGScheduler: Executor lost: 2013053023002039616172-5050-28213-0 (generation 19)
13/05/31 01:39:43 INFO BlockManagerMasterActor: Trying to remove executor 2013053023002039616172-5050-28213-0 from BlockManagerMaster.
13/05/31 01:39:43 INFO BlockManagerMaster: Removed 2013053023002039616172-5050-28213-0 successfully in removeExecutor
13/05/31 01:39:43 INFO TaskSetManager: Starting task 0.0:0 as TID 23 on executor 2013053023002039616172-5050-28213-0: gate5.corp.com (preferred)
13/05/31 01:39:43 INFO TaskSetManager: Serialized task 0.0:0 as 1493 bytes in 0 ms
13/05/31 01:39:44 INFO TaskSetManager: Re-queueing tasks for 2013053023002039616172-5050-28213-1 from TaskSet 0.0

srikanth reddy

unread,
Aug 2, 2013, 6:02:56 PM8/2/13
to spark...@googlegroups.com
I am getting the same errors. Not sure why!

Ian O'Connell

unread,
Aug 2, 2013, 6:29:00 PM8/2/13
to spark...@googlegroups.com
You need to look at the logs on the workers to be sure, but i suspect they are looking for data.txt locally and it doesn't exist there


--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

srikanth reddy

unread,
Aug 5, 2013, 1:21:18 PM8/5/13
to spark...@googlegroups.com, i...@ianoconnell.com
I get the below error messages in the worker node logs.
++++++
3/08/06 02:00:38 INFO executor.StandaloneExecutorBackend: Connecting to driver: akka://sp...@172.16.12.101:32867/user/StandaloneScheduler
13/08/06 02:00:38 INFO executor.StandaloneExecutorBackend: Successfully registered with driver
13/08/06 02:00:38 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/08/06 02:00:38 INFO storage.BlockManagerMaster: Connecting to BlockManagerMaster: akka://sp...@172.16.12.101:32867/user/BlockMasterManager
13/08/06 02:00:38 INFO storage.MemoryStore: MemoryStore started with capacity 323.9 MB.
13/08/06 02:00:38 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20130806020038-5bc4
13/08/06 02:00:38 INFO network.ConnectionManager: Bound socket to port 60413 with id = ConnectionManagerId(ibm003,60413)
13/08/06 02:00:38 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/08/06 02:00:48 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
        at akka.dispatch.DefaultPromise.ready(Future.scala:870)
        at akka.dispatch.DefaultPromise.result(Future.scala:874)
        at akka.dispatch.Await$.result(Future.scala:74)
        at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:154)
        at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:133)
        at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)
        at spark.storage.BlockManager.initialize(BlockManager.scala:123)
        at spark.storage.BlockManager.<init>(BlockManager.scala:108)
        at spark.storage.BlockManager.<init>(BlockManager.scala:115)
        at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:91)
        at spark.executor.Executor.initialize(Executor.scala:72)
        at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:39)
        at spark.executor.StandaloneExecutorBackend$$anonfun$receive$1.apply(StandaloneExecutorBackend.scala:36)
        at akka.actor.Actor$class.apply(Actor.scala:318)
        at spark.executor.StandaloneExecutorBackend.apply(StandaloneExecutorBackend.scala:16)
        at akka.actor.ActorCell.invoke(ActorCell.scala:626)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
        at akka.dispatch.Mailbox.run(Mailbox.scala:179)
        at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
        at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
        at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
        at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
        at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
13/08/06 02:01:01 WARN storage.BlockManagerMaster: Error sending message to BlockManagerMaster in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10000] milliseconds
        at akka.dispatch.DefaultPromise.ready(Future.scala:870)
        at akka.dispatch.DefaultPromise.result(Future.scala:874)
        at akka.dispatch.Await$.result(Future.scala:74)
        at spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:154)
        at spark.storage.BlockManagerMaster.tell(BlockManagerMaster.scala:133)
        at spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)
        at spark.storage.BlockManager.initialize(BlockManager.scala:123)
        at spark.storage.BlockManager.<init>(BlockManager.scala:108)
        at spark.storage.BlockManager.<init>(BlockManager.scala:115)
        at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:91)

srikanth reddy

unread,
Aug 5, 2013, 2:00:28 PM8/5/13
to spark...@googlegroups.com, i...@ianoconnell.com

When the Standalone mode cluster is restarted, this issue went away.
Not sure what was the original problem.
Reply all
Reply to author
Forward
0 new messages