Spring on cluster with slurm

86 views
Skip to first unread message

Julian Reitz

unread,
Jul 20, 2016, 9:22:30 AM7/20/16
to emspring
Hi everybody,

we are trying to get Spring running on our cluster with slurm.
Anyway we are facing several difficulties.

Has anybody experience with spring and slurm and could provide us some kind of minimal example?

Thanks a lot,
Julian

Carsten Sachse

unread,
Jul 20, 2016, 4:11:49 PM7/20/16
to emspring
Hi Julian,

Currently there are still some issues with running SLURM out of the box. It is working out of the box for other queuing systems such as LSF, PBS and Sun grid engine. The main reason is that I don't have access to such a system to properly debug it. A number of people have reported a workaround, which I can share. This way you can run the programs that are available as _mpi ending.

Best wishes,


Carsten
=======================
You start the job with e.g. 'sbatch segrefine_1x.sbtach'


segrefine_1x.sbatch:
-------------------------------------------------------------
#!/bin/bash

#SBATCH --job-name=segrefine
#SBATCH --partition=compute
#SBATCH --time=3-0
#SBATCH --mem-per-cpu=20000
#SBATCH --ntasks=200

#SBATCH --mail-type=BEGIN,FAIL,END
#SBATCH --output=job_%j.out
#SBATCH --error=job_%j.err

module load spring/0.84

mpirun segmentrefine3d_mpi --f segrefine_1x.par
-------------------------------------------------------------


spring module file (we load different environments with the 'module' command, a very handy tool):
-------------------------------------------------------------
#%Module1.0##################################################################
#
set appname    [lrange [split [module-info name] {/}] 0 0]
set appversion [lrange [split [module-info name] {/}] 1 1]
set apphome    /location/$appname/$appversion

## URL of application homepage:

## Short description of package:
#module-whatis   "SPRING helical package"

## Load any needed modules:
#module load openmpi.gcc/1.8.6
module load openmpi.gcc/1.6.5

## Modify as needed, removing any variables not needed.  Non-path variables
## can be set with "setenv VARIABLE value".
prepend-path    PATH $apphome/bin:$apphome/parts/EMAN2/bin/
prepend-path    LD_LIBRARY_PATH $apphome/lib:$apphome/EMAN2/lib:$apphome/parts/EMAN2/lib
prepend-path    PYTHONPATH $apphome/parts/EMAN2/Python-2.7-ucs4/lib/python2.7:$apphome/lib/python2.7:$apphome/parts/EMAN2/lib/
prepend-path    PYTHONHOME $apphome/parts/EMAN2/Python-2.7-ucs4
prepend-path    MANPATH         $apphome/share/man
#prepend-path    CPATH           $apphome/include
#prepend-path    FPATH           $apphome/include
#prepend-path    PKG_CONFIG_PATH $apphome/lib/pkgconfig

## These lines are for logging module usage.  Don't remove them:
set modulefile [lrange [split [module-info name] {/}] 0 0]
set version    [lrange [split [module-info name] {/}] 1 1]
set action     [module-info mode]
system logger -t module -p local6.info DATE=\$(date +%FT%T),USER=\$USER,JOB=\$\{SLURM_JOB_ID=NOJOB\},APP=$modulefile,VERSION=$version,ACTION=$action
## Don't remove this line!  For some reason, it has to be here...
-------------------------------------------------------------

mvalle

unread,
Sep 23, 2016, 5:58:36 AM9/23/16
to emspring
Dear Carsten,

we have moved our cluster to slurm. Bad move indeed.

I have used your module and now spring starts working (at least) but in an unproper way. Instead of creating a single directoy, it creates one per processor, and it finally crashes. I guess that the synchronization and/or communication between cpus is not going well.

Any suggestion?

Thanks in advance.
Reply all
Reply to author
Forward
0 new messages