[slurm-users] sbatch - accept jobs above limits

285 views
Skip to first unread message

z1...@arcor.de

unread,
Feb 8, 2022, 5:24:03 PM2/8/22
to slurm-users

Dear all,

sbatch jobs are immediately rejected if no suitable node is available in
the configuration.


sbatch: error: Memory specification can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration
is not available

These jobs should be accepted, if a suitable node will be active soon.
For example, these jobs could be in PartitionConfig.

Is that configurable?


Many thanks,

Mike

Stephen Cousins

unread,
Feb 8, 2022, 6:12:31 PM2/8/22
to Slurm User Community List, slurm-users
I think this message comes up when there are no nodes in that partition have the resources capable to meet the requirements. Can you show what the partition definition is in slurm.conf along with what the job is asking for?

z1...@arcor.de

unread,
Feb 8, 2022, 7:34:31 PM2/8/22
to slurm...@lists.schedmd.com
Yes, the partition does not meet the requirements now.

The job should still be submitted and wait until requirements are available.

Christopher Samuel

unread,
Feb 8, 2022, 7:46:44 PM2/8/22
to slurm...@lists.schedmd.com
On 2/8/22 2:26 pm, z1...@arcor.de wrote:

> These jobs should be accepted, if a suitable node will be active soon.
> For example, these jobs could be in PartitionConfig.

From memory if you submit jobs with the `--hold` option then you should
find they are successfully accepted - I've used that in the past (and
just checked that it still works with 20.11.8, assuming nobody has snuck
a node with 2TB of RAM in whilst I wasn't looking).

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Stephen Cousins

unread,
Feb 8, 2022, 8:41:21 PM2/8/22
to Slurm User Community List
What I'm saying is that the job might not be able to run in that partition. Ever. The job might be asking for more resources than the partition can provide. Maybe I'm wrong but it would help to know what the partition definition is, along with what resources the nodes in that partition have specified (both of these in slurm.conf) and then what the job is asking for.

Ryan Novosielski

unread,
Feb 8, 2022, 8:45:48 PM2/8/22
to Slurm User Community List, slurm-users
I’m not 100% certain that this affects this situation, but there’s a slurm.conf setting called EnforcePartLimits that you might want to change.

--
#BlackLivesMatter
____
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
`'

Stephen Cousins

unread,
Feb 9, 2022, 1:29:45 AM2/9/22
to Slurm User Community List
I can duplicate this error word for word by submitting a job asking for 150gb of memory when the nodes in that partition have a maximum of 128GB.

Take a look at the memory values in your node specifications and in your job script or command line. Maybe there is a typo.

Alexander Block

unread,
Feb 9, 2022, 2:41:41 AM2/9/22
to z1...@arcor.de, Slurm User Community List
Hi Mike,

I'm just discussing a familiar case with SchedMD right now (ticket
13309). But it seems that it is not possible with Slurm to submit jobs
that request features/configuration that are not available at the moment
of submission.

Cheers,

Alexander

Brian Andrus

unread,
Feb 9, 2022, 11:23:07 AM2/9/22
to slurm...@lists.schedmd.com

Just curious as to expectations out here.

When should slurm immediately reject a job?

Brian Andrus

Ryan Cox

unread,
Feb 9, 2022, 12:01:48 PM2/9/22
to Slurm User Community List
Mike,

You could potentially add a non-existent node (or nodes) to the
configuration that has a million cores, petabytes of RAM, and all the
features in the world.  Then it "exists" in Slurm.  I don't know if
FUTURE would work, but if you can tolerate having a DOWN node in sinfo,
that could work.

Ryan
--
Ryan Cox
Director
Office of Research Computing
Brigham Young University


Christopher Samuel

unread,
Feb 9, 2022, 1:14:07 PM2/9/22
to slurm...@lists.schedmd.com
On 2/8/22 11:41 pm, Alexander Block wrote:

> I'm just discussing a familiar case with SchedMD right now (ticket
> 13309). But it seems that it is not possible with Slurm to submit jobs
> that request features/configuration that are not available at the moment
> of submission.

Does --hold not allow that for you?
Reply all
Reply to author
Forward
0 new messages