Mechanism for a rocket to catch SIGTERM (or other signals) and react to them?

14 views

Skip to first unread message

David Dotson

unread,

Apr 22, 2016, 4:57:45 PM4/22/16

to fireworkflows

First, this is a fantastic library, displacing much of what I used to do to get work done on our own infrastructure in just two days of using it. Thanks for that!

Our queueing system is set up such that qdel'ing a job sends a SIGTERM instead of a SIGKILL, and our submission scripts traditionally are built to catch this signal and do cleanup operations (mainly, copy back data from the worker to the fileservers). I'm making this still work in the meantime by putting the catching + cleanup into my queue script template, but does it make sense to give a Firework the ability to define tasks to run upon receipt of a certain signal? Something similar to how one can already define background processes?

I'm not sure exactly how this could be implemented at the moment, or if it's worth doing, but I'm interested in any better ways of making a Firework or rocket react gracefully upon receipt of a given signal.

Thanks!

David

Anubhav Jain

unread,

Apr 23, 2016, 12:32:44 AM4/23/16

to David Dotson, fireworkflows

Hi David,

I think there are a couple of options to already do it:

- in the "rocket_launch" of the queue script, don't use "rlaunch" but rather some other command that wraps rlaunch and does your cleanup if it catches an error.

- modify the FireTasks themselves to try and catch the error. If this results in repeated code (e.g.,. there are 10 different FireTasks that should clean things up in the same way), one could use a generic function decorator to the run() function to help avoid this. The decorator can try the original run() function, but if it catches an error, to do something else. Then you can just add that decorator to the FireTasks.

If you think having an additional feature would be useful beyond those options, perhaps you can write a bit about how it might be better (i.e., what problem does it solve) and how it would be implemented. For example, one option is to have a special keyword in the spec like "_error_tasks" that links to a list of FireTasks (the same as any other FireTask so we don't need a new object, and so you can reuse existing CopyTasks if needed). In the Rocket code, if we catch the error and the _error_tasks key exists, it can execute the _error_tasks (which can also return a FWAction). There is not much harm to implementing this, but it would be nice to know a little more about the use case.

Best,

Anubhav

--
You received this message because you are subscribed to the Google Groups "fireworkflows" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fireworkflow...@googlegroups.com.
To post to this group, send email to firewo...@googlegroups.com.
Visit this group at https://groups.google.com/group/fireworkflows.
To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/636cade3-609b-431c-91ba-63c289180f50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages