In my cluster right now we have:
$ sinfo
...
pubgpu-req up 7-00:00:00 7 mix cobra,l40s-[01,03],luisa,rtx-03,sevilla,shelob
pubgpu-req up 7-00:00:00 5 idle fiber,glaurung,gothmog,l40s-02,leo
The following works fine:
$ srun -p pubgpu-req -A sysadm --nodelist=cobra,l40s-02 --gres=gpu:1 -N 1 \
--ntasks-per-node=1 --mem=1G --time=1:00:00 --cpus-per-task=4 --pty /bin/bash
srun: tres_per_node => gres/gpu:1
cobra[0]:~$ exit
exit
However just change the order of the nodelist and you get
$ srun -p pubgpu-req -A sysadm --nodelist=l40s-02,cobra --gres=gpu:1 -N 1 \
--ntasks-per-node=1 --mem=1G --time=1:00:00 --cpus-per-task=4 --pty /bin/bash
srun: tres_per_node => gres/gpu:1
srun: error: Unable to create step for job 8255390: Requested node
configuration is not available
More experimentation with this and it appears nodelist HAS to be in
alphabetical order
$ srun -p pubgpu-req -A sysadm --nodelist=l40s-02,leo,fiber \
--gres=gpu:1 -N 1 --ntasks-per-node=1 --mem=1G --time=1:00:00 \
--cpus-per-task=4 --pty /bin/bash
srun: tres_per_node => gres/gpu:1
srun: error: Unable to create step for job 8255401: Requested node
configuration is not available
$ srun -p pubgpu-req -A sysadm --nodelist=fiber,l40s-02,leo \
--gres=gpu:1 -N 1 --ntasks-per-node=1 --mem=1G --time=1:00:00 \
--cpus-per-task=4 --pty /bin/bash
srun: tres_per_node => gres/gpu:1
fiber[0]:~$
Surely there is no good reason for this?
---------------------------------------------------------------
Paul Raines
http://help.nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129 USA
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at
https://www.massgeneralbrigham.org/complianceline <
https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
--
slurm-users mailing list --
slurm...@lists.schedmd.com
To unsubscribe send an email to
slurm-us...@lists.schedmd.com