Thanks for you quick response, Matei! spark.mesos.coarse was set to true before creating sc. I paste more lines before the exception happens in the log. The test.scala contains only a few lines of code that actually does things as I described in the last post.
12/12/26 23:17:59 INFO DAGScheduler: Completed ResultTask(1, 0)
12/12/26 23:17:59 INFO SparkContext: Job finished: count at test.scala:1128, took 87.189440487 s
12/12/26 23:17:59 INFO SparkContext: Starting job: collect at test.scala:1133
12/12/26 23:17:59 INFO DAGScheduler: Registering RDD 7 (map at test.scala:1133)
12/12/26 23:17:59 INFO CacheTracker: Registering RDD ID 7 with cache
12/12/26 23:17:59 INFO CacheTrackerActor: Registering RDD 7 with 2 partitions
12/12/26 23:17:59 INFO DAGScheduler: Registering parent RDD 7 (map at test.scala:1133)
12/12/26 23:17:59 INFO DAGScheduler: Registering parent RDD 6 (partitionBy at test.scala:1119)
12/12/26 23:17:59 INFO CacheTrackerActor: Asked for current cache locations
12/12/26 23:17:59 INFO DAGScheduler: Got job 2 (collect at test.scala:1133) with 2 output partitions
12/12/26 23:17:59 INFO DAGScheduler: Final stage: Stage 3 (map at test.scala:1133)
12/12/26 23:17:59 INFO DAGScheduler: Parents of final stage: List(Stage 2)
12/12/26 23:17:59 INFO DAGScheduler: Missing parents: List()
12/12/26 23:17:59 INFO DAGScheduler: Submitting Stage 3 (map at test.scala:1133), which has no missing parents
12/12/26 23:17:59 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3
12/12/26 23:17:59 INFO ClusterScheduler: Adding task set 3.0 with 2 tasks
12/12/26 23:17:59 INFO TaskSetManager: Starting task 3.0:0 as TID 40 on slave 201212262301-638519306-5050-2445-1: ip-10-191-41
-16.ec2.internal (preferred)
12/12/26 23:17:59 INFO TaskSetManager: Serialized task 3.0:0 as 2982 bytes in 2 ms
12/12/26 23:17:59 INFO TaskSetManager: Starting task 3.0:1 as TID 41 on slave 201212262301-638519306-5050-2445-0: ip-10-8-85-1
24.ec2.internal (preferred)
12/12/26 23:17:59 INFO TaskSetManager: Serialized task 3.0:1 as 2982 bytes in 2 ms
12/12/26 23:18:39 INFO CoarseMesosSchedulerBackend: Slave 201212262301-638519306-5050-2445-0 disconnected, so removing it
12/12/26 23:18:39 INFO TaskSetManager: Re-queueing tasks for ip-10-8-85-124.ec2.internal from TaskSet 3.0
12/12/26 23:18:39 INFO TaskSetManager: Lost TID 41 (task 3.0:1)
12/12/26 23:18:39 INFO TaskSetManager: Starting task 3.0:1 as TID 42 on slave 201212262301-638519306-5050-2445-1: ip-10-191-41-16.ec2.internal (preferred)
12/12/26 23:18:39 INFO DAGScheduler: Host lost: ip-10-8-85-124.ec2.internal
12/12/26 23:18:39 INFO TaskSetManager: Serialized task 3.0:1 as 2982 bytes in 2 ms
12/12/26 23:18:39 INFO BlockManagerMasterActor: Trying to remove the host: ip-10-8-85-124.ec2.internal:10902 from BlockManagerMaster.
12/12/26 23:18:39 INFO BlockManagerMasterActor: Previous hosts: ArrayBuffer(BlockManagerId(ip-10-8-85-124.ec2.internal, 48468), BlockManagerId(ip-10-191-41-16.ec2.internal, 54751), BlockManagerId(ip-10-68-199-106.ec2.internal, 41809))
12/12/26 23:18:39 INFO BlockManagerMasterActor: Current hosts: ArrayBuffer(BlockManagerId(ip-10-8-85-124.ec2.internal, 48468), BlockManagerId(ip-10-191-41-16.ec2.internal, 54751), BlockManagerId(ip-10-68-199-106.ec2.internal, 41809))
12/12/26 23:18:39 INFO BlockManagerMaster: Removed ip-10-8-85-124.ec2.internal successfully in notifyADeadHost
12/12/26 23:18:39 INFO Stage: Stage 2 is now unavailable on ip-10-8-85-124.ec2.internal (24/36, false)
12/12/26 23:18:39 INFO CacheTrackerActor: Memory cache lost on ip-10-8-85-124.ec2.internal
12/12/26 23:18:39 INFO CacheTracker: CacheTracker successfully removed entries on ip-10-8-85-124.ec2.internal
12/12/26 23:18:39 INFO CacheTrackerActor: Asked for current cache locations
12/12/26 23:18:39 INFO CoarseMesosSchedulerBackend: Slave 201212262301-638519306-5050-2445-1 disconnected, so removing it
12/12/26 23:18:39 INFO TaskSetManager: Re-queueing tasks for ip-10-191-41-16.ec2.internal from TaskSet 3.0
12/12/26 23:18:39 INFO TaskSetManager: Lost TID 40 (task 3.0:0)
12/12/26 23:18:39 INFO TaskSetManager: Lost TID 42 (task 3.0:1)
12/12/26 23:18:39 INFO DAGScheduler: Host lost: ip-10-191-41-16.ec2.internal
12/12/26 23:18:39 INFO BlockManagerMasterActor: Trying to remove the host: ip-10-191-41-16.ec2.internal:10902 from BlockManagerMaster.
12/12/26 23:18:39 INFO TaskSetManager: Starting task 3.0:1 as TID 43 on slave 201212262301-638519306-5050-2445-2: ip-10-68-199-106.ec2.internal (preferred)
12/12/26 23:18:39 INFO BlockManagerMasterActor: Previous hosts: ArrayBuffer(BlockManagerId(ip-10-8-85-124.ec2.internal, 48468), BlockManagerId(ip-10-191-41-16.ec2.internal, 54751), BlockManagerId(ip-10-68-199-106.ec2.internal, 41809))
12/12/26 23:18:39 INFO BlockManagerMasterActor: Current hosts: ArrayBuffer(BlockManagerId(ip-10-8-85-124.ec2.internal, 48468), BlockManagerId(ip-10-191-41-16.ec2.internal, 54751), BlockManagerId(ip-10-68-199-106.ec2.internal, 41809))
12/12/26 23:18:39 INFO BlockManagerMaster: Removed ip-10-191-41-16.ec2.internal successfully in notifyADeadHost
12/12/26 23:18:39 INFO Stage: Stage 2 is now unavailable on ip-10-191-41-16.ec2.internal (12/36, false)
12/12/26 23:18:39 INFO CacheTrackerActor: Memory cache lost on ip-10-191-41-16.ec2.internal
12/12/26 23:18:39 INFO TaskSetManager: Serialized task 3.0:1 as 2982 bytes in 1 ms
12/12/26 23:18:39 INFO TaskSetManager: Starting task 3.0:0 as TID 44 on slave 201212262301-638519306-5050-2445-2: ip-10-68-199-106.ec2.internal (preferred)
12/12/26 23:18:39 INFO CacheTracker: CacheTracker successfully removed entries on ip-10-191-41-16.ec2.internal
12/12/26 23:18:39 INFO CacheTrackerActor: Asked for current cache locations
12/12/26 23:18:39 INFO TaskSetManager: Serialized task 3.0:0 as 2982 bytes in 1 ms
12/12/26 23:18:40 INFO MapOutputTrackerActor: Asked to send map output locations for shuffle 0 to ip-10-68-199-106.ec2.internal
12/12/26 23:18:40 INFO MapOutputTracker: Size of output statuses for shuffle 0 is 174 bytes
12/12/26 23:18:40 INFO TaskSetManager: Lost TID 44 (task 3.0:0)
12/12/26 23:18:40 INFO TaskSetManager: Loss was due to java.lang.NullPointerException