Hi,
I'm trying to assist a local group with using a genome assembly software
using SLURM job arrays. The software was created by another group and
supports SGE, PBS and LSF. The code sections below include the SGE and
PBS versions. The authors provided notes regarding how to edit the code
however since I have not used job arrays yet this is unfamiliar to me.
Could anyone assist to convert the code section to function with slurm?
Any assistance is appreciated,
Thank you,
Kevin
Notes from the author with code sections that need to me modified.
========================================================
Software source:
========================================================
http://www.cbcb.umd.edu/software/PBcR/MHAP/
http://wgs-assembler.sourceforge.net/wiki/index.php?title=PBcR
========================================================
The code is all public and open source so you can share it on the dev
email list.
I forgot to mention that not all options must be defined. It’s possible
you can significantly simplify your options if SLUM supports jobs
holding for other jobs by name (SGE supports this but neither PBS nor
LSF do). That is what the gridEngineNameToJobIDCommand* options do and
they are undefined on SGE. SGE also doesn’t differentiate holds for
array jobs or regular jobs (LSF and PBS both do) so if SLUM is more like
SGE in that sense you may also only need to define gridEngineHoldOption
not gridEngineHoldOptionNoArray. Here is the SGE chunk of code, you can
see several undefined variables:
if (($var eq "gridEngine") && ($val eq "SGE")) {
setGlobal("gridEngineSubmitCommand", "qsub");
setGlobal("gridEngineHoldOption", "-hold_jid
\"WAIT_TAG\"");
setGlobal("gridEngineHoldOptionNoArray", undef);
setGlobal("gridEngineSyncOption", "-sync y");
setGlobal("gridEngineNameOption", "-cwd -N");
setGlobal("gridEngineArrayOption", "-t ARRAY_JOBS");
setGlobal("gridEngineArrayName", "ARRAY_NAME");
setGlobal("gridEngineOutputOption", "-j y -o");
setGlobal("gridEnginePropagateCommand", "qalter -hold_jid
\"WAIT_TAG\"");
setGlobal("gridEngineNameToJobIDCommand", undef);
setGlobal("gridEngineNameToJobIDCommandNoArray", undef);
setGlobal("gridEngineTaskID", "SGE_TASK_ID");
setGlobal("gridEngineArraySubmitID", "\\\$TASK_ID");
setGlobal("gridEngineJobID", "JOB_ID");
}
========================================================
CA has some generic mechanisms to support multiple grid engines it uses
now for LSF/PBS/SGE. It’s not the cleanest implementation but it’s hard
to keep it generic when the clusters vary so much in how they
submit/schedule jobs. The basic requirements from a scheduler system to
run CA is that it supports array jobs, supports individual nodes
submitting jobs (i.e. all nodes on the cluster have to be able to submit
a job), supports a job holding for other jobs, and supports altering a
running job to change it’s hold/other stats. There are three blocks of
code in runCA that configures a set of options for each grid. For
example, here is the PBS block:
>> if (($var eq "gridEngine") && ($val eq "PBS")) {
>> setGlobal("gridEngineSubmitCommand", "qsub");
>> setGlobal("gridEngineHoldOption", "-W
depend=afteranyarray:\"WAIT_TAG\"");
>> setGlobal("gridEngineHoldOptionNoArray", "-W
depend=afterany:\"WAIT_TAG\"");
>> setGlobal("gridEngineSyncOption", "");
>> setGlobal("gridEngineNameOption", "-d `pwd` -N");
>> setGlobal("gridEngineArrayOption", "-t ARRAY_JOBS");
>> setGlobal("gridEngineArrayName", "ARRAY_NAME\[ARRAY_JOBS\]");
>> setGlobal("gridEngineOutputOption", "-j oe -o");
>> setGlobal("gridEnginePropagateCommand", "qalter -W
depend=afterany:\"WAIT_TAG\"");
>> setGlobal("gridEngineNameToJobIDCommand", "qstat -f |grep -F
-B 1 WAIT_TAG | grep Id: | grep -F [] |awk '{print \$NF}'");
>> setGlobal("gridEngineNameToJobIDCommandNoArray", "qstat -f
|grep -F -B 1 WAIT_TAG | grep Id: |awk '{print \$NF}'");
>> setGlobal("gridEngineTaskID", "PBS_ARRAYID");
>> setGlobal("gridEngineArraySubmitID", "\\\$PBS_ARRAYID");
>> setGlobal("gridEngineJobID", "PBS_JOBID");
>> }
Basically, it has options that tell it how to submit jobs, how to hold
for array and non-array jobs, how to specify an array jobs, along with
how to get identifiers for a running job (the grep/awk command) for
systems that do not supports holds based on job names. You can look at
the code to see the options for LSF and SGE as well. I’ve never used
SLURM so I don’t know how similar it is to any of the other engines and
whether you can fit it into the above framework. If you can, then it
should be relatively straightforward to customize the above parameters
to it.
The bugs in CA 8.3rc1 with PBS were all because we didn’t have access to
a PBS system and a collaborator had given changes required to run on
their PBS system. However, we had no way to validate that it worked on
another PBS system. A different collaborator encountered bugs with their
PBS system and I fixed instances where the code was not generic and jobs
were not getting proper parameters passed to them (for example the job
array ID), causing errors. This was all a matter of knowing how PBS
handles it’s job that is generic across versions (which the
documentation is often misleading on).
========================================================
>>>
>>>
http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes
>>> --------------------------------------------
>>> Changes in CA 8.3
>>> Fixed support for PBS/Torque in addition to LSF and SGE
>>> --------------------------------------------
>>> Bug Fixes
>>> Fix bug that would cause array jobs to fail on PBS
>>> Fix bug that would cause errors when submitting jobs to PBS
>>> --------------------------------------------
>>>
--
Kevin Abbey
Systems Administrator
Center for Computational and Integrative Biology (CCIB)
http://ccib.camden.rutgers.edu/
Rutgers University - Science Building
315 Penn St.
Camden, NJ 08102
Telephone:
(856) 225-6770
Fax:
(856) 225-6312
Email:
kevin...@rutgers.edu