I suspect you have too low of a setting for "MaxJobCount"
MaxJobCount
The maximum number of jobs SLURM can have in its active database
at one time. Set the values of MaxJobCount and MinJobAge to
insure the slurmctld daemon does not exhaust its memory or other
resources. Once this limit is reached, requests to submit
additional jobs will fail. The default value is 5000 jobs. This
value may not be reset via "scontrol reconfig". It only takes
effect upon restart of the slurmctld daemon. May not exceed
65533.
so if you already have (by default) 5000 jobs being considered, the remaining aren't even looked at.
Brian Andrus
Have you looked at the High Throughput Computing Administration Guide: https://slurm.schedmd.com/high_throughput.html
In particular, for this problem may be to look at the SchedulerParameters. I believe that the scheduler defaults to be very conservative and will stop looking for jobs to run pretty quickly.
Mike Robbert
Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research Computing
Information and Technology Solutions (ITS)
303-273-3786 | mrob...@mines.edu
![]()
Our values: Trust | Integrity | Respect | Responsibility
From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of David Henkemeyer <david.he...@gmail.com>
Date: Thursday, May 12, 2022 at 12:34
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [External] Re: [slurm-users] Question about having 2 partitions that are mutually exclusive, but have unexpected interactions
CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.