Yes, we've see the same thing with mosaic/heterogeneous partitions. Our solution is to split based on hardware type.
Having a bunch of partitions may seem unwieldy but the scheduler can handle it. For instance we have 110 partitions and the scheduler handles it fine (most of those are hardware owned by specific groups not public partitions everyone can see). We've taken up the convention of naming our partitions after the hardware type. For instance we have a gpu partition (our A100's) and a gpu_h200 partition. Making it easy for people to identify the hardware. People who can use both will leverage mutltipartition submission ala #SBATCH -p gpu,gpu_h200.
I don't know of a good solution if you want to keep the mosiac partition as it really requires you users to think at a higher level and realize there is vacant hardware that could be used if they just selected a different gpu type. Having a separate partition makes it much easier to see.
-Paul Edmon-
--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com