Yes, we've see the same thing with mosaic/heterogeneous partitions. Our solution is to split based on hardware type.
Having a bunch of partitions may seem unwieldy but the scheduler can handle it. For instance we have 110 partitions and the scheduler handles it fine (most of those are hardware owned by specific groups not public partitions everyone can see). We've taken up the convention of naming our partitions after the hardware type. For instance we have a gpu partition (our A100's) and a gpu_h200 partition. Making it easy for people to identify the hardware. People who can use both will leverage mutltipartition submission ala #SBATCH -p gpu,gpu_h200.
I don't know of a good solution if you want to keep the mosiac partition as it really requires you users to think at a higher level and realize there is vacant hardware that could be used if they just selected a different gpu type. Having a separate partition makes it much easier to see.
-Paul Edmon-
--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com