You should have the job script itself have the singularity/apptainer command.
I am guessing you don't want your users to have to deal with that part for their scripts, so I would suggest using a wrapper script.
You could just have them run something like: cluster_run.sh <path_to_script>
Then cluster_run.sh would call sbatch along with the appropriate commands.
Brian Andrus
The remaining issue then is how to put them into an allocation that is actually running a singularity container. I don't get how what I'm doing now is resulting in an allocation where I'm in a container on the submit node still!
salloc <salloc-parameters> srun <srun-parameters> /usr/bin/singularity shell <path to sif>
| Subject: | [slurm-users] slurm and singularity |
|---|---|
| Date: | Tue, 7 Feb 2023 17:31:45 +0000 |
| From: | Groner, Rob <rug...@psu.edu> |
| Reply-To: | Slurm User Community List <slurm...@lists.schedmd.com> |
| To: | slurm...@lists.schedmd.com <slurm...@lists.schedmd.com> |
| External email: Use caution opening links or attachments |
|
You don't often get email from fr...@udel.edu.
Learn why this is important
|
--interactive --pty --export=TERM
[(it_nss:frey)@login00.darwin ~]$ salloc -p idle srun /opt/shared/singularity/3.10.0/bin/singularity shell /opt/shared/singularity/prebuilt/postgresql/13.2.simgsalloc: Granted job allocation 3953722salloc: Waiting for resource configurationsalloc: Nodes r1n00 are ready for jobls -l
total 437343
-rw-r--r-- 1 frey it_nss 180419 Oct 26 16:56 amd.cache
-rw-r--r-- 1 frey it_nss 72 Oct 26 16:52 amd.conf
-rw-r--r-- 1 frey everyone 715 Nov 12 23:39 anaconda-activate.sh
drwxr-xr-x 2 frey everyone 4 Apr 11 2022 bin:
[(it_nss:frey)@login00.darwin ~]$ salloc -p idle srun --pty /opt/shared/singularity/3.10.0/bin/singularity shell /opt/shared/singularity/prebuilt/postgresql/13.2.simgsalloc: Granted job allocation 3953723salloc: Waiting for resource configurationsalloc: Nodes r1n00 are ready for jobSingularity>
On Feb 8, 2023, at 09:47 , Groner, Rob <rug...@psu.edu> wrote:
I tried that, and it says the nodes have been allocated, but it never comes to an apptainer prompt.
I then tried doing them in separate steps. Doing salloc works, I get a prompt on the node that was allocated. I can then run "singularity shell <sif>" and get the apptainer prompt. If I prefix that command with "srun", then it just hangs and I never get the prompt. So that seems to be the sticking point. I'll have to do some experiments running singularity with srun.
If you are using a newer slurm, using srun for an interactive shell is being deprecated. Salloc now defaults to a shell if no command is specified:
DESCRIPTION
salloc is used to allocate a Slurm job allocation, which is a
set of resources (nodes), possibly with some set of constraints
(e.g. number of processors per node). When salloc successfully
obtains the requested allocation, it then runs the command
specified by the user. Finally, when the user specified command
is complete, salloc relinquishes the job allocation.
The command may be any program the user wishes. Some typical
commands are xterm, a shell script containing srun commands, and
srun (see the EXAMPLES section). If no command is specified,
then salloc runs the user's default shell.
Brian Andrus