[slurm-users] Naive SLURM question: equivalent to LSF pre-exec

70 views
Skip to first unread message

Cutts, Tim via slurm-users

unread,
Feb 14, 2024, 8:40:12 AM2/14/24
to slurm...@lists.schedmd.com

Hi, I apologise if I’ve failed to find this in the documentation (and am happy to be told to RTFM) but a recent issue for one of my users resulted in a question I couldn’t answer.

 

LSF has a feature called a Pre-Exec where a script executes to check whether a node is ready to run a task.  So, you can run arbitrary checks and go back to the queue if they fail.

 

For example, if I have some automounted filesystems, and I want to be able to check for failure of the automounted, in an LSF world, I can do:

 

  bsub -E “test -f /nfs/someplace/file_I_know_exists” my_job.sh

 

What’s the equivalent in SLURM?

 

Thanks,

 

Tim

 

-- 

Tim Cutts

Scientific Computing Platform Lead

AstraZeneca

 

Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue |

 


AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com

Paul Edmon via slurm-users

unread,
Feb 14, 2024, 9:34:24 AM2/14/24
to slurm...@lists.schedmd.com

Paul Raines via slurm-users

unread,
Feb 14, 2024, 10:00:58 AM2/14/24
to Paul Edmon, slurm-users

The Prolog will run with every job, not just "as asked for" by the user.
Also it runs as the root or slurm user, not the user who submitted.
For that one would use TaskProlog but at that point there is no
way to abort or requeue the job I think from TaskProlog

The Prolog script could check for environment var set by user
such as SLURM_USER_PROLOG that it will 'su' run as the submitting
user if it exists. Then if it returns a non-zero return value,
exit and return that value. Even with 'su' there are security
issues one has to think through here.

The requeing thing is a bit tricky. I would not necessarily
set ForceRequeueOnFail as some Prolog scripts probably really
want some jobs just cancelled. Also Prolog will put the node
in a drain state which is not necessarily what an admin wants
when a user's prolog script fails.

Not sure there is any good way to do this with safe requeing.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)

On Wed, 14 Feb 2024 9:32am, Paul Edmon via slurm-users wrote:

> External Email - Use Caution

>
> You probably want the Prolog option:

> https://secure-web.cisco.com/1gA_zj13OnVqs4BaLrstiwdHEvx0FITE_aDl92-7hACgRFo_Ph48JPmpZ9c5eUdI5r38RRv4LyHRZxUazGd8Y_CxRcSjCSPq4HCIQJcE60NasvEWY9i9Xgqo6APDiT8QvHKHdYw50eQKRazhP2XS1g-wXOiOOw7uPptriVL5hqDIwKYoVSXAuGhHms65rMC17PKxnfoFr0MI86JHZ2ecT4U3sFwTTtV-dVm9VPNG-mQcT-61c-7jDh8mJ-iQFauaFo9p9qmU6XPonf41CieTMfOIcaTkNo9Z04YFmOY8hH-q1xTXVS-sc2AhU0kzQ5t_D/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_Prolog
> along with:
> https://secure-web.cisco.com/1yoj7-l3lvo6_mD2LfIN7tNcHHzRekef8BenX_pB-l_Y7mzJdx9VNkuJnU8gyQGzWeU5PydyWg37_UnlJ-9STr9PxDBKHbmzaItEyH3XzeXO9cJY5-0NrAHcvRaBL76KveIqVKxkIAYjIwbmDtolahe9_FEuINl-B53wd6YYisn6loWpdYtQpL0z4Mjz4DZWuxs-GaRcoZRSUDqmseghEAlLBUJvKpdkAUBOA78xhWCIv6W7jJb75di-NmFX5h2R_GPJa9tdgTBgvdh3MS8FYTTnAH7R2hAK6X2iXoO6EsGmkWQP0l-8PCdsTreZ9bkkn/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_ForceRequeueOnFail


>
> -Paul Edmon-
>
> On 2/14/2024 8:38 AM, Cutts, Tim via slurm-users wrote:
>>
>> Hi, I apologise if I’ve failed to find this in the documentation (and am
>> happy to be told to RTFM) but a recent issue for one of my users resulted
>> in a question I couldn’t answer.
>>
>> LSF has a feature called a Pre-Exec where a script executes to check
>> whether a node is ready to run a task.  So, you can run arbitrary checks
>> and go back to the queue if they fail.
>>
>> For example, if I have some automounted filesystems, and I want to be
>> able to check for failure of the automounted, in an LSF world, I can do:
>>
>>   bsub -E “test -f /nfs/someplace/file_I_know_exists” my_job.sh
>>
>> What’s the equivalent in SLURM?
>>
>> Thanks,
>>
>> Tim
>>
>> --
>>

>> *Tim Cutts*


>>
>> Scientific Computing Platform Lead
>>
>> AstraZeneca
>>
>> Find out more about R&D IT Data, Analytics & AI and how we can support
>> you by visiting ourService Catalogue

>> <https://secure-web.cisco.com/1rZFGGAYuCJMmirDSdCijgYo0A_aAByN6SOBZixUX1qDb_AQrhGzQNOOfxivOQjGgoJQ_3Eqm_BlvSd_99xvFZ3dhHGloY6L4ITdMvmqo5V3Ye9UUtqy5yyYPyNL3bZYq62Bru2u_9cx17-A7smV0ki_kxvPQzgh_zY_aVzr9oQDKFSBuIesGJY6WzLFQUWsMl8o_-8GjfGz-lOf7QVzLM8ztcMhWsdoRg3qA3rxJQKM3WO-9A9Hys1B8fjQm8Xowvab8kzZX7qb1fcySnuMAOo2Ya8A-MKnRn37j4izSFUyORtIHFCzfgpKVoGm5qGGY/https%3A%2F%2Fazcollaboration.sharepoint.com%2Fsites%2FCMU993>|
>>
>> ------------------------------------------------------------------------


>>
>> AstraZeneca UK Limited is a company incorporated in England and Wales
>> with registered number:03674842 and its registered office at 1 Francis
>> Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
>>
>> This e-mail and its attachments are intended for the above named
>> recipient only and may contain confidential and privileged information.
>> If they have come to you in error, you must not copy or show them to
>> anyone; instead, please reply to this e-mail, highlighting the error to
>> the sender and then immediately delete the message. For information about
>> how AstraZeneca UK Limited and its affiliates may process information,
>> personal data and monitor communications, please see our privacy notice
>> at

>> http://secure-web.cisco.com/1q7NtvBOcnPasccer2doNzN_s8v1EcsmDX2FxZh2VSwc2uzmfYW2FXyowHk8HzIZc3W29AeTyP6K3IQ09J9wkqccL3YEmWXawrFtfmdq4C8grGvRzVHvP8J2EGesqYf4oYUBmWr7AbxxKPhAbl3_e2wUlnsio3GqIuAIn5DESBYEyg0rqpn3XrV-XdDVIqQGcGsaeOB6a_rQ_hylgkpEWxW8078vY1BOiAqG6st4UyGCztQVnXAAk1i55kJAUDVOJXrlkLtooEiXNuxgj4Q6yITevENGYhXbsTU9gc1GsJvqCMgYpjwfFGovZqMEIToZx/http%3A%2F%2Fwww.astrazeneca.com
>> <https://secure-web.cisco.com/1xNPy0N6i0blsHKxRwrqo0R1iVZyR41A621xvyePwSoVAl5Tc2ArIZ9NmL29hR1B_q1XOFOnZqSGCai9RYImf1zjIwm39_NKKECz6O377I-r6BL0oFiqz1C6B1xJzdVSObRj6UDy8bamGhiWmDDacDmaZ_oR70hSG6_D5himo4pWc0egrX4eNB433Ojyq0jnHlnpptYP2bL0ZwEQ5-rddJoumT6bWSB9jO16W9EJphvrFuYuL2HrXU0TdV1MW0_hzwCluTHWZu9wQvZx5KaeMNK_opzMNPRdMilX_knuPkNRqwnCIf7pcS1f9Nq_I2qI7/https%3A%2F%2Fwww.astrazeneca.com>
>>
>>
>
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.

Reply all
Reply to author
Forward
0 new messages