Dear SLURM users,
I am IT Administrator of a small scientific computing center. We recently installed SLURM as a job scheduler on our Cluster and everything seems to be working fine. I just have a question about how to create temporary directories with SLURM.
We use some programs for scientific calculation (such as Gromacs, Gaussian, NAMD, etc.). So, the process is the following:
When we need to launch a calculation the first step is to copy all the necessary files from the local "$SLURM_SUBMIT_DIR" directory to the "/scratch" of the remote node, second step is to access the "/scratch" of the remote node and then run the program. Finally, when the program finishes we copy all the output files from the remote node's "/scratch" back to the local "$SLURM_SUBMIT_DIR" directory.
So, is there any way to automatically generate a temporary directory inside the "/scratch" of the remote node?
At the moment I am creating that directory manually as follows:
"export HOMEDIR=$SLURM_SUBMIT_DIR
export SCRATCHDIR=/scratch/job.$SLURM_JOB_ID.$USER
export WORKDIR=$SCRATCHDIR
mkdir -p $WORKDIR
cp $HOMEDIR/* $WORKDIR
cd $WORKDIR
$NAMD/namd2 +idlepoll +p11 run_eq.namd > run_eq.log
wait
cp $WORKDIR/* $HOMEDIR"
The main problem when you create the "/scratch" manually is that in case the calculation ends (successfully or unsuccessfully), users have to check the "/scratch" and remove the directory manually. I know I could include a line at the end of my script to delete that directory when the calculation is done, but I'm sure there must be a better way to do this.
Thanks in advance for the help.
best regards,
Alain
Hello,
you might want to have a look at the auto_tmpdir plugin (
https://github.com/University-of-Delaware-IT-RCI/auto_tmpdir ),
which does pretty much what you want - it creates job-specific
temporary directories and bindmounts them into the specified
locations (so the job will see /tmp/, /scratch/ and /dev/shm/
while these folders are actually located at /tmp/slurm-$jobid/,
/scratch/slurm-$jobid/, and /dev/shm/slurm-$jobid/).
The bindmounts are destroyed when the job exits, so there's no
need to manually delete them (and it's also much safer, because it
also deletes them when the job crashes).
Note: Depending on the SLURM version you're running, you might
need to checkout the "dir-removal-fixup" branch instead of master
(iirc for SLURM < 20.x).
Kind regards,
René Sitt
-- Dipl.-Chem. René Sitt Hessisches Kompetenzzentrum für Hochleistungsrechnen Philipps-Universität Marburg Hans-Meerwein-Straße 35032 Marburg Tel. +49 6421 28 23523 si...@hrz.uni-marburg.de www.hkhlr.de