[slurm-users] Weirdness with partitions

1,660 views
Skip to first unread message

Diego Zuccato

unread,
Sep 21, 2023, 3:01:06 AM9/21/23
to Slurm User Community List
Hello all.

We have one partition (b4) that's reserved for an account while the
others are "free for all".
The problem is that
sbatch --partition=b1,b2,b3,b4,b5 test.sh
fails with
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified
while
sbatch --partition=b1,b2,b3,b5 test.sh
succeeds.

Shouldn't Slurm (22.05.6) just "filter out" the inaccessible partition,
considering only the others?
Just like what it does if I'm requesting more cores than available on a
node.

I'd really like to avoid having to replicate scheduler logic in
job_submit.lua... :)

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

David

unread,
Sep 21, 2023, 8:42:34 AM9/21/23
to Slurm User Community List
I would think that slurm would only filter it out, potentially, if the partition in question (b4) was marked as "hidden" and only accessible by the correct account.
--
David Rhey
---------------
Advanced Research Computing
University of Michigan

Diego Zuccato

unread,
Sep 21, 2023, 9:15:53 AM9/21/23
to slurm...@lists.schedmd.com
Uh? It's not a problem if other users see there are jobs in the
partition (IIUC it's what 'hidden' is for), even if they can't use it.

The problem is that if it's included in --partition it prevents jobs
from being queued!
Nothing in the documentation about --partition made me think that
forbidding access to one partition would make a job unqueueable...

Diego

David

unread,
Sep 21, 2023, 9:47:34 AM9/21/23
to Slurm User Community List
Slurm is working as it should. From your own examples you proved that; by not submitting to b4 the job works. However, looking at man sbatch:

       -p, --partition=<partition_names>
              Request  a  specific partition for the resource allocation.  If not specified, the default behavior is to allow the slurm controller to select
              the default partition as designated by the system administrator. If the job can use more than one partition, specify their names  in  a  comma
              separate  list and the one offering earliest initiation will be used with no regard given to the partition name ordering (although higher pri‐
              ority partitions will be considered first).  When the job is initiated, the name of the partition used will be placed first in the job  record
              partition string.

In your example, the job can NOT use more than one partition (given the restrictions defined on the partition itself precluding certain accounts from using it). This, to me, seems either like a user education issue (i.e. don't have them submit to every partition), or you can try the job submit lua route - or perhaps the hidden partition route (which I've not tested).

Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)

unread,
Sep 21, 2023, 10:26:00 AM9/21/23
to Slurm User Community List
On Sep 21, 2023, at 9:46 AM, David <dr...@umich.edu> wrote:

Slurm is working as it should. From your own examples you proved that; by not submitting to b4 the job works. However, looking at man sbatch:

       -p, --partition=<partition_names>
              Request  a  specific partition for the resource allocation.  If not specified, the default behavior is to allow the slurm controller to select
              the default partition as designated by the system administrator. If the job can use more than one partition, specify their names  in  a  comma
              separate  list and the one offering earliest initiation will be used with no regard given to the partition name ordering (although higher pri‐
              ority partitions will be considered first).  When the job is initiated, the name of the partition used will be placed first in the job  record
              partition string.

In your example, the job can NOT use more than one partition (given the restrictions defined on the partition itself precluding certain accounts from using it). This, to me, seems either like a user education issue (i.e. don't have them submit to every partition), or you can try the job submit lua route - or perhaps the hidden partition route (which I've not tested).

That's not at all how I interpreted this man page description.  By "If the job can use more than..." I thought it was completely obvious (although perhaps wrong, if your interpretation is correct, but it never crossed my mind) that it referred to whether the _submitting user_ is OK with it using more than one partition. The partition where the user is forbidden (because of the partition's allowed account) should just be _not_ the earliest initiation (because it'll never initiate there), and therefore not run there, but still be able to run on the other partitions listed in the batch script. 

I think it's completely counter-intuitive that submitting saying it's OK to run on one of a few partitions, and one partition happening to be forbidden to the submitting user, means that it won't run at all.  What if you list multiple partitions, and increase the number of nodes so that there aren't enough in one of the partitions, but not realize this problem?  Would you expect that to prevent the job from ever running on any partition?

Noam 

David

unread,
Sep 21, 2023, 10:38:56 AM9/21/23
to Slurm User Community List
That's not at all how I interpreted this man page description.  By "If the job can use more than..." I thought it was completely obvious (although perhaps wrong, if your interpretation is correct, but it never crossed my mind) that it referred to whether the _submitting user_ is OK with it using more than one partition. The partition where the user is forbidden (because of the partition's allowed account) should just be _not_ the earliest initiation (because it'll never initiate there), and therefore not run there, but still be able to run on the other partitions listed in the batch script.

> that's fair. I was considering this only given the fact that we know the user doesn't have access to a partition (this isn't the surprise here) and that slurm communicates that as the reason pretty clearly. I can see how if a user is submitting against multiple partitions they might hope that if a job couldn't run in a given partition, given the number of others provided, the scheduler might consider all of those *before* dying outright at the first rejection.

Jason Simms

unread,
Sep 21, 2023, 11:25:36 AM9/21/23
to Slurm User Community List
I personally don't think that we should assume users will always know which partitions are available to them. Ideally, of course, they would, but I think it's fine to assume users should be able to submit a list of partitions that they would be fine running their jobs on, and if one is forbidden for whatever reason, Slurm just selects another one of the choices. I'd expect similar behavior if a particular partition were down or had been removed; as long as there is an acceptable specified partition available, run it there, and don't kill the job. Seems really reasonable to me.

Jason
--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
Schedule a meeting: https://calendly.com/jlsimms

Feng Zhang

unread,
Sep 21, 2023, 11:37:39 AM9/21/23
to Slurm User Community List
Set slurm.conf parameter: EnforcePartLimits=ANY or NO may help this, not sure.

Best,

Feng

Best,

Feng

Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)

unread,
Sep 21, 2023, 11:46:06 AM9/21/23
to Slurm User Community List
On Sep 21, 2023, at 11:37 AM, Feng Zhang <prod...@gmail.com> wrote:

Set slurm.conf parameter: EnforcePartLimits=ANY or NO may help this, not sure.

Hmm, interesting, but it looks like this is just a check at submission time. The slurm.conf web page doesn't indicate that it affects the actual queuing decision, just whether or not a job that will never run (at all, or just on some of the listed partitions) can be submitted.  If it does help then I think that the slurm.conf description is misleading.

Noam

Feng Zhang

unread,
Sep 21, 2023, 2:33:59 PM9/21/23
to Slurm User Community List
As I said I am not sure, but it depends on the algorithm and the code
structure of the slurm(no chance to dig into...). My imagination
is(for the way slurm works...):

Check limits on b1, ok,b2: ok: b3,ok; then b4, nook...(or any order by slurm)

If it works with the EnforcePartLimits=ANY or NO, yeah it's a surprise...

(This use case might not be included in the original design of slurm, I guess)

"NOTE: The partition limits being considered are its configured
MaxMemPerCPU, MaxMemPerNode, MinNodes, MaxNodes, MaxTime, AllocNodes,
AllowAccounts, AllowGroups, AllowQOS, and QOS usage threshold."

Best,

Feng

On Thu, Sep 21, 2023 at 11:48 AM Bernstein, Noam CIV USN NRL (6393)
Washington DC (USA) <noam.be...@nrl.navy.mil> wrote:
>

Feng Zhang

unread,
Sep 21, 2023, 2:42:00 PM9/21/23
to Slurm User Community List
As I read again on the pasted slurm.conf info, it includes
"AllowAccounts, AllowGroups,", so it seems slurm actually takes this
into account. So I think it should work...

Best,

Feng

Diego Zuccato

unread,
Sep 22, 2023, 12:09:50 AM9/22/23
to slurm...@lists.schedmd.com
Il 21/09/2023 16:25, Bernstein, Noam CIV USN NRL (6393) Washington DC
(USA) ha scritto:

> What
> if you list multiple partitions, and increase the number of nodes so
> that there aren't enough in one of the partitions, but not realize this
> problem?
That's exactly the case that lead me to write that snippet in
job_submit.lua ...

>  Would you expect that to prevent the job from ever running on
> any partition?Currently (and, I think, wrongly) that's exactly what happens.

Diego Zuccato

unread,
Sep 22, 2023, 12:24:04 AM9/22/23
to slurm...@lists.schedmd.com
Thanks. It seems EnforcePartLimits=ANY is what I need:
If set to "ANY" a job must satisfy any of the requested partitions to be
submitted.

Probably it got changed by who reinstalled the cluster and I didn't
notice :(

And Slurm was doing what it's been told to do. As usual :)

Tks again
Diego
Reply all
Reply to author
Forward
0 new messages