[slurm-users] slurm and singularity

456 views
Skip to first unread message

Groner, Rob

unread,
Feb 7, 2023, 12:32:30 PM2/7/23
to slurm...@lists.schedmd.com
I'm trying to setup the capability where a user can execute:

$: sbatch <parameters> script_to_run.sh

and the end result is that a job is created on a node, and that job will execute "singularity exec <path to container> script_to_run.sh"

Also, that they could execute:

$: salloc <parameters>

and would end up on a node per their parameters, and instead of a bash prompt, they have the singularity prompt because they're inside a running container.

Oddly, I ran:  salloc <parameters> /usr/bin/singularity shell <path to sif> and that allocated and said the node was ready and gave me an apptainer prompt...cool!  But when I asked it what hostname I was on, I was NOT on the node that it had said was ready, I was still on the submit node.  When I exit out of the apptainer shell, it ends my allocation.  Sooo...it gave me the allocation and started the apptainer shell, but somehow I was still on the submit node.

As far as the job, I've done some experiments with using job_submit.lua to replace the script with one that has a singularity call in it instead, and that might hold some promise.  But I'd have to write the passed-in script to a temp file or something, and then have singularity exec that. That MIGHT work.

The results for "slurm and singularity" do not describe what I'm trying to do.  The closest thing I can find is what slurm touts on their website, a leftover from Slurm 2017 talking about a spank plugin that, as near as I can figure, doesn't exist.  I read through the OCI docs on the slurm website, but it shows that using singularity with that requires all commands to have sudo. That's not going to work.

I'm running out of ideas here.

Thanks,

Rob

Brian Andrus

unread,
Feb 7, 2023, 12:53:01 PM2/7/23
to slurm...@lists.schedmd.com

You should have the job script itself have the singularity/apptainer command.


I am guessing you don't want your users to have to deal with that part for their scripts, so I would suggest using a wrapper script.


You could just have them run something like: cluster_run.sh <path_to_script>

Then cluster_run.sh would call sbatch along with the appropriate commands.


Brian Andrus

Groner, Rob

unread,
Feb 7, 2023, 4:53:06 PM2/7/23
to slurm...@lists.schedmd.com
Looks like we can go the route of a wrapper script, since our users don't specifically need to know they're running an sbatch.  Thanks for the suggestion.

The remaining issue then is how to put them into an allocation that is actually running a singularity container.  I don't get how what I'm doing now is resulting in an allocation where I'm in a container on the submit node still!


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Brian Andrus <toom...@gmail.com>
Sent: Tuesday, February 7, 2023 12:52 PM
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] slurm and singularity
 

Jeffrey T Frey

unread,
Feb 7, 2023, 6:16:44 PM2/7/23
to Slurm User Community List
The remaining issue then is how to put them into an allocation that is actually running a singularity container.  I don't get how what I'm doing now is resulting in an allocation where I'm in a container on the submit node still!

Try prefixing the singularity command with "srun" e.g.


salloc <salloc-parameters> srun <srun-parameters> /usr/bin/singularity shell <path to sif>

Carl Ponder

unread,
Feb 7, 2023, 11:01:41 PM2/7/23
to Groner, Rob, Slurm User Community List


Take a look at this extension to SLURM:
You put the container path on the srun command-line and each rank runs inside it's own copy of the image.


Subject: [slurm-users] slurm and singularity
Date: Tue, 7 Feb 2023 17:31:45 +0000
From: Groner, Rob <rug...@psu.edu>
Reply-To: Slurm User Community List <slurm...@lists.schedmd.com>
To: slurm...@lists.schedmd.com <slurm...@lists.schedmd.com>


External email: Use caution opening links or attachments

Markus Kötter

unread,
Feb 8, 2023, 1:53:58 AM2/8/23
to slurm...@lists.schedmd.com
Hi,


On 08.02.23 05:00, Carl Ponder wrote:
> Take a look at this extension to SLURM:
>
> https://github.com/NVIDIA/pyxis

https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf

enroot & pyxis - great recommendation for rootless containerized runtime
environments in HPC.

Free software, no license or DGX required.


Some things to consider

Cache in /tmp so it's free'd upon reboot:
# /etc/enroot/enroot.conf
ENROOT_RUNTIME_PATH /tmp/enroot/user-$(id -u)
ENROOT_CACHE_PATH /tmp/enroot-cache/user-$(id -u)
ENROOT_DATA_PATH /tmp/enroot-data/user-$(id -u)


When using a local container repo, the image urls port is seperated
using # :

> srun … --container-image mygitlab:5005#path/pytorch:22.12-py3 …


MfG
--
Markus Kötter, +49 681 870832434
30159 Hannover, Lange Laube 6
Helmholtz Center for Information Security

Groner, Rob

unread,
Feb 8, 2023, 9:48:03 AM2/8/23
to Slurm User Community List
I tried that, and it says the nodes have been allocated, but it never comes to an apptainer prompt.

I then tried doing them in separate steps.  Doing salloc works, I get a prompt on the node that was allocated.  I can then run "singularity shell <sif>" and get the apptainer prompt.  If I prefix that command with "srun", then it just hangs and I never get the prompt.  So that seems to be the sticking point.  I'll have to do some experiments running singularity with srun.


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Jeffrey T Frey <fr...@udel.edu>
Sent: Tuesday, February 7, 2023 6:16 PM
To: Slurm User Community List <slurm...@lists.schedmd.com>

Subject: Re: [slurm-users] slurm and singularity
 
You don't often get email from fr...@udel.edu. Learn why this is important

Jeffrey T Frey

unread,
Feb 8, 2023, 10:01:55 AM2/8/23
to Slurm User Community List
You may need srun to allocate a pty for the command.  The InteractiveStepOptions we use (that are handed to srun when no explicit command is given to salloc) are:


--interactive --pty --export=TERM


E.g. without those flags a bare srun gives a promptless session:


[(it_nss:frey)@login00.darwin ~]$ salloc -p idle srun /opt/shared/singularity/3.10.0/bin/singularity shell /opt/shared/singularity/prebuilt/postgresql/13.2.simg
salloc: Granted job allocation 3953722
salloc: Waiting for resource configuration
salloc: Nodes r1n00 are ready for job
ls -l
total 437343
-rw-r--r--  1 frey it_nss      180419 Oct 26 16:56 amd.cache
-rw-r--r--  1 frey it_nss          72 Oct 26 16:52 amd.conf
-rw-r--r--  1 frey everyone       715 Nov 12 23:39 anaconda-activate.sh
drwxr-xr-x  2 frey everyone         4 Apr 11  2022 bin
   :


With the --pty flag added:


[(it_nss:frey)@login00.darwin ~]$ salloc -p idle srun --pty /opt/shared/singularity/3.10.0/bin/singularity shell /opt/shared/singularity/prebuilt/postgresql/13.2.simg
salloc: Granted job allocation 3953723
salloc: Waiting for resource configuration
salloc: Nodes r1n00 are ready for job
Singularity>



On Feb 8, 2023, at 09:47 , Groner, Rob <rug...@psu.edu> wrote:

I tried that, and it says the nodes have been allocated, but it never comes to an apptainer prompt.

I then tried doing them in separate steps.  Doing salloc works, I get a prompt on the node that was allocated.  I can then run "singularity shell <sif>" and get the apptainer prompt.  If I prefix that command with "srun", then it just hangs and I never get the prompt.  So that seems to be the sticking point.  I'll have to do some experiments running singularity with srun.

Groner, Rob

unread,
Feb 8, 2023, 10:20:22 AM2/8/23
to Slurm User Community List
Ah, thanks so much.  I'm still a slurm newbie and I've barely used srun.  I'm not sure how long it would have taken me to find and understand those parameters from the docs.  Thanks!


From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Jeffrey T Frey <fr...@udel.edu>
Sent: Wednesday, February 8, 2023 10:01 AM

Brian Andrus

unread,
Feb 8, 2023, 11:46:31 AM2/8/23
to slurm...@lists.schedmd.com

If you are using a newer slurm, using srun for an interactive shell is being deprecated. Salloc now defaults to a shell if no command is specified:

DESCRIPTION
salloc is used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node). When salloc successfully obtains the requested allocation, it then runs the command specified by the user. Finally, when the user specified command is complete, salloc relinquishes the job allocation.

The command may be any program the user wishes. Some typical commands are xterm, a shell script containing srun commands, and srun (see the EXAMPLES section). If no command is specified, then salloc runs the user's default shell.

Brian Andrus

Reply all
Reply to author
Forward
0 new messages