[slurm-dev] Inconsistencies in documentation and logic regarding PreemptType and PreemptMode

3 views
Skip to first unread message

Bill....@bull.com

unread,
Jul 12, 2011, 11:17:13 AM7/12/11
to slur...@lists.llnl.gov

There are inconsistencies in both the documentation and operation of slurm regarding the combination of PreemptType=preempt/qos and PreemptMode=suspend,gang.  Some documentation
says PreemptType=preempt/qos isn't compatible with PreemptMode=SUSPEND and other documentation shows it as a valid combination.  The system does not allow designating PreemptType=preempt/qos
and PreemptMode=suspend,gang in slurm.conf, but sacctmgr allows modifying a qos to set PreemptMode=suspend when the system is configured with PreemptType=preempt/qos.

slurm.conf man for PreemptType=preempt/qos:

                    This  is  not compatible with PreemptMode=OFF or Preempt-
                     Mode=SUSPEND (i.e. preempted jobs must  be  removed  from
                     the resources).

sacctmgr man in SPECIFICATIONS FOR QOS:

PreemptMode
              Mechanism used to preempt jobs of this QOS if the clusters  Pre-
              emptType  is  configured to preempt/qos.  The default preemption
              mechanism is specified by the cluster-wide PreemptMode  configu-
              ration  parameter.   Possible  values are "Cluster" (meaning use
              cluster default), "Cancel", "Checkpoint",  "Requeue"  and  "Sus-
              pend".

preempt.html page :

       preempt/qos indicates that jobs from one Quality Of Service (QOS)

        can preempt jobs from a lower QOS. These jobs can be in the same
        partition or different partitions. PreemptMode must be set to CANCEL,
        CHECKPOINT, SUSPEND or REQUEUE.

I ran some experiments to see how slurm would respond.  First I designate PreemptType=preempt/qos with PreemptMode=suspend,gang.  When I started slurm with these options I see the following message:

slurmd: fatal: PreemptType and PreemptMode values incompatible

I changed those options so that slurm would start and issued the following command:

'sacctmgr modify qos where name=lowpri set preemptmode=suspend'

This modification was accepted & when I issued 'sacctmgr show qos' it did display the PreemptMode of 'suspend'.

I am willing to make changes to make this consistent, but need to know whether the intent is to support PreemptType=preempt/qos and PreemptMode=suspend,gang or not.  If not, I need to know the
reasoning so I can update documentation and logic accordingly.

Best Regards,
Bill

Danny Auble

unread,
Jul 12, 2011, 11:54:57 AM7/12/11
to slur...@lists.llnl.gov

Bill, it would be ok to update the docs, but please only make an error in sacctmgr when asking for suspend for QOS instead of changing all the other logic there. QOS can support gang scheduling in the future, and I wouldn't want to rewrite all that code to make it possible in accounting ;).


Danny

je...@schedmd.com

unread,
Jul 18, 2011, 4:37:24 PM7/18/11
to slur...@lists.llnl.gov, Bill....@bull.com
Hi Bill,

The logic supporting gang scheduling is also used to resume jobs which
have been suspended for a higher priority job. All of the data
structures in that module (src/slurmctld/gang.c) are designed to
support preemption based upon job partition and there is no logic
present to support preemption based upon QOS. It certainly could be
added at some time, but is completely absent today.

It would be great if you would make the documentation changes to
reflect the current behavior. If you send a patch, we'll get it into
the next release.

Thanks,
Moe Jette

Reply all
Reply to author
Forward
0 new messages