[slurm-users] Use a portion of resources already allocated for a script

11 views
Skip to first unread message

Michael Lamparski

unread,
Sep 20, 2018, 7:02:40 PM9/20/18
to slurm...@lists.schedmd.com
Hello all,

For years I've been looking for what I might consider to be the holy grail of composable resource allocation in slurm jobs:

* A command that can be run inside of an sbatch script...
* ...which immediately and synchronously invokes another sbatch script (which may or may not invoke mpirun in turn)...
* ...using a subset of the currently allocated resources.

This is the smallest unit of functionality that would compose well with existing tools in UNIX for orchestration.  For instance, I could use xargs as a semaphore to let each node work on one input at a time, and for a given input I could have an arbitrarily complex python script decide dynamically what computations to run.

Years of Google and manpage searches have continually failed me.

* salloc can synchronously run an sbatch script, but as far as I can tell, it cannot make job steps, only jobs.
* srun can run sychronously and make job steps, but as far as I can tell, it cannot call a script which calls mpirun (it insists on *replacing* mpirun)
* Today I discovered that sbatch can also create job steps (albeit awkwardly, via --jobid), and it can obviously run sbatch scripts... but as far as I can tell, it cannot run synchronously!

One can't help but wonder whether this is a deliberate omission or just criminal oversight!

Today I snapped and started working on a synchronous wrapper around sbatch; the plan is to use --jobid=$SLURM_JOB_ID, find out the job step id (somehow...), and then sattach to it.  I say this knowing it'll probably make your skin crawl.  And I ask: What do you think I ought to do instead?

Michael

Michael Lamparski

unread,
Sep 21, 2018, 10:47:10 AM9/21/18
to slurm...@lists.schedmd.com
> Today I discovered that sbatch can also create job steps (albeit awkwardly, via --jobid), and it can obviously run sbatch scripts... but as far as I can tell, it cannot run synchronously!

Rats.  I just discovered that sbatch has a "--wait" option, but it must have been added after 2.2.4, because it isn't available on this machine.

Michael
Reply all
Reply to author
Forward
0 new messages