Hi All,
My spark job is a simple map only job which prints out a value for each input line. It runs 10 iterations.
I have 11 nodes with 16 GB memory each.
spark-env.sh has SPARK_MEM=10g
The job runs fine for 20 ~ 30 GB data.
But when I run the job for about 500 GB data the job fails.
Any idea as to why this might be happening?
Shouldn't Spark jobs run like a normal hadoop job if the dataset can't be held in memory?
Please suggest something to fix this issue or some pointers to what I might be doing wrong.
The following is the output on the console.
13/06/04 05:50:18 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3989)
13/06/04 05:50:19 INFO cluster.TaskSetManager: Finished TID 3950 in 15437 ms (progress: 3996/4000)
13/06/04 05:50:19 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3949)
13/06/04 05:50:21 INFO cluster.TaskSetManager: Finished TID 3969 in 15175 ms (progress: 3997/4000)
13/06/04 05:50:21 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3979)
13/06/04 05:50:21 INFO cluster.TaskSetManager: Finished TID 3946 in 17400 ms (progress: 3998/4000)
13/06/04 05:50:21 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3932)
13/06/04 05:50:22 INFO cluster.TaskSetManager: Finished TID 3959 in 16854 ms (progress: 3999/4000)
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3950)
13/06/04 05:50:22 INFO cluster.TaskSetManager: Finished TID 3985 in 13940 ms (progress: 4000/4000)
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(1, 3985)
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Stage 1 (groupByKey at TestIteration.scala:29) finished in 464.517 s
13/06/04 05:50:22 INFO scheduler.DAGScheduler: looking for newly runnable stages
13/06/04 05:50:22 INFO scheduler.DAGScheduler: running: Set()
13/06/04 05:50:22 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)
13/06/04 05:50:22 INFO scheduler.DAGScheduler: failed: Set()
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[7] at saveAsTextFile at TestIteration.scala:31), which is now runnable
13/06/04 05:50:22 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at saveAsTextFile at TestIteration.scala:31)
13/06/04 05:50:22 INFO cluster.ClusterScheduler: Adding task set 0.0 with 1 tasks
13/06/04 05:50:22 INFO cluster.TaskSetManager: Starting task 0.0:0 as TID 4000 on executor 9:
node2.example.com (preferred)
13/06/04 05:50:22 INFO cluster.TaskSetManager: Serialized task 0.0:0 as 8534 bytes in 51 ms
13/06/04 05:50:22 INFO spark.MapOutputTrackerActor: Asked to send map output locations for shuffle 0 to
node2.example.com13/06/04 05:50:23 INFO spark.MapOutputTracker: Size of output statuses for shuffle 0 is 5286 bytes
13/06/04 05:52:30 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(9,
node2.example.com, 48706) with no recent heart beats
13/06/04 05:52:32 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager
node2.example.com:48706 with 6.3 GB RAM
13/06/04 05:52:50 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(9,
node2.example.com, 48706) with no recent heart beats
13/06/04 05:52:52 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager
node2.example.com:48706 with 6.3 GB RAM
Thanks,
Austin