Hi Bill,
I think Moe gives you the right answer but it was so concise it can be
easily misunderstood.
If we take the situation you describe with a simple analysis from
backfilling algorithm point of view, the answer is job 300 should be
scheduled without any impact on jobs 201 and 202. However, what I think
Moe tried to say is there are other details to take into account, not
just total number of free cores. Those cores could be really free but,
for example, due to per-node memory requirements they can not be used.
Or maybe you have reservations which are reserving some cores but you
can not see it just looking at free cores. Or you have some licenses or
partitions limitations. Or your system does not allow to share nodes so
free cores does not mean you can use them. All this assuming you do not
have other pending jobs between job 201 and job 300. There is a
backfilling parameter max_job_bf which limits the number of jobs to be
processed by the algorithm. The default number is 50. Also, as
backfilling is so demanding it is suspended after some time. Before
resuming, if something changed in the system, the backfilling algorithm
will start from scratch. You can avoid this using bf_continue parameter.
As you can see there are a lot of details which could have an impact. We
have suffered this situation in the past and it is not always trivial to
see the reason behind scheduling decisions. I added extra debug
information for backfilling algorithm to see how resources were being
reserved by pending jobs and it was helpful. Maybe it would be
interesting to have some way for knowing why a job can not be scheduled.
There are other resource managers giving this detailed information but
it would have a cost, of course.
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer