detect_lostruns while running jobs on multiple clusters

19 views
Skip to first unread message

Martin Siron

unread,
Jun 12, 2019, 1:17:33 PM6/12/19
to fireworkflows
When running:
lpad detect_lostruns --fizzle

On one cluster, and having multiple jobs running across multiple clusters (ie. some on Savio, some on Lawrencium). Anything that is running on a different cluster than where the command was executed gets marked as FIZZLED even if the job is still running on the other cluster.

Is there a way to only execute that command on fireworks running on the cluster the command is run or, could this be fixed by implementing the "host" field from the firework's launch document in the future?

Thanks!

Martin

Anubhav Jain

unread,
Jun 24, 2019, 7:54:11 PM6/24/19
to fireworkflows
Hi Martin

I just pushed a change so that detect_lostruns takes in an optional argument called "launch_query". Here you can restrict any part of the launch document.

You should be able to set launch_query="{'host':'my_host'}" or launch_query="{'host':{'$regex': 'my_host*'}}"

This should be pushed in the next release of FWS (v1.9.3). I haven't had a chance to test it so let me know if it works for you.

Martin Siron

unread,
Jul 24, 2019, 6:52:12 PM7/24/19
to fireworkflows
Thanks, that worked!
Reply all
Reply to author
Forward
0 new messages