correct way to kill running spark taskset, without killing Spark Context?

Imran Rashid

unread,

Jan 8, 2013, 12:31:02 AM1/8/13

to spark...@googlegroups.com

Every so often I want to abort some long-running computation in spark, but I would like to leave the spark context & cached data alone. Eg., sometimes I've waited a while for a bunch of data to get loaded in memory, and some intermediate calculations to happen, but then I run something that either (a) takes a really long time due to a coding error on my part or (b) fails, but takes a long time for spark to kill the job (or maybe spark never realized the stage has died). What is the right way for me to kill just that one running computation?

of course I'd need to have some error handling in my code to deal w/ this, but ideally there would be some JobKilledException or something which I could then handle, eg.

val sc = ...
val myBigRdd = ...
try{
//run unsafe code here
} catch {
case JobKilledException => ...
}

Anyway to do this or something similar?

thanks,
Imran

Matei Zaharia

unread,

Jan 8, 2013, 12:43:42 AM1/8/13

to spark...@googlegroups.com

We don't have anything like this right now. There was a discussion on killing tasks here though: https://github.com/mesos/spark/pull/209. It's a little tricky -- to do it right, we'd have to "job was killed" event on the workers and inject an exception at some appropriate place (wherever we read input data).

Matei

Shivaram Venkataraman

unread,

Jan 8, 2013, 11:39:54 AM1/8/13

to spark...@googlegroups.com

Right - One of the goals of that effort was to support hitting
control+C and being able to just kill the tasks that are active. I
will try to revive that discussion soon.

Shivaram

Imran Rashid

unread,

Jan 8, 2013, 1:24:03 PM1/8/13

to spark...@googlegroups.com

thanks for pointing me at that pull request. It seemed quite a bit more complicated than I was expecting, though -- does this problem become any easier if I just want to kill an entire job, not kill individual tasks? I was thinking we'd just need to create another event which gets handled in the run loop in the DAGScheduler. That's what I thought Matei was suggesting in his reply as well, but the comment in the pull request sounds much more comlicated -- I didn't think we'd need to create any special RDDs.

If the other change is necessary for speculative execution and is in the works, I can just wait for that ...

Reply all

Reply to author

Forward