Yes, I wasn't aware of that, but it might also be useful for us, too.
>> The question is why you actually need partitions with different
>> maximum
>> runtimes.
>
> we would like to have only a sub set of the nodes in a partition for
> long running jobs, so that there are enough nodes available for short
> jobs.
>
> The nodes for the long partition, however are also part of the short
> partition so they can also be utilized when no long jobs are running.
>
> That's our idea....
If you have plenty of short running jobs, that is probably a reasonable
approach. On our system, the number of short running jobs would
probably tend to dip significantly over the weekend and public holidays,
so resources would potentially be blocked for the long running jobs. On
the other hand, long-running jobs on our system often run for days, so
one day here or there might not be so significant. And if the
long-running jobs were able to start in the short partition, they could
block short jobs.
The other thing to think about with regard to short jobs is backfilling.
With our mix of jobs, unless a job needs a large amount of memory or
number of cores, those with a run-time of only a few hours should be
backfilled fairly efficiently.
Regards
Loris
>> In our case, a university cluster with a very wide range of codes
>> and
>> usage patterns, multiple partitions would probably lead to fragmentation
>> and wastage of resources due to the job mix not always fitting well to
>> the various partitions. Therefore, I am a member of the "as few
>> partitions as possible" camp and so in our set-up we have as essentially
>> only one partition with a DefaultTime of 14 days. We do however let
>> users set a QOS to gain a priority boost in return for accepting a
>> shorter run-time and a reduced maximum number of cores.
>
> we didn't look into QOS yet, but this might also a way to go, thanks.
>
>> Occasionally people complain about short jobs having to wait in the
>> queue for too long, but I have generally been successful in solving the
>> problem by having them estimate their resource requirements better or
>> bundling their work in ordert to increase the run-time-to-wait-time
>> ratio.
>>
>
> Dietmar