But if I configure the gpus also by name like this in slurm.conf:
NodeName=koala NodeAddr=10.194.132.190 Boards=1 SocketsPerBoard=2 CoresPerSocket=14 ThreadsPerCore=2 Gres=gpu:A5000:3,gpu:RTX5000:1,shard:88 Feature=gpu,ht
and run 7 jobs, each requesting 12 shards, It does NOT Work. It starts 2 jobs on the first two A5000's, two job on the RTX5000, and one job on last A5000. Strangely, it still knows that it should not
start more jobs - subsequent jobs are still queued.
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 904552 C ...ing_proj/.venv/bin/python 204MiB |
| 0 N/A N/A 1176564 C ...-2020-ubuntu20.04/bin/gmx 258MiB |
| 0 N/A N/A 1176565 C ...-2020-ubuntu20.04/bin/gmx 258MiB |
| 1 N/A N/A 1176562 C ...-2020-ubuntu20.04/bin/gmx 258MiB |
| 1 N/A N/A 1176566 C ...-2020-ubuntu20.04/bin/gmx 258MiB |
| 2 N/A N/A 1176560 C ...-2020-ubuntu20.04/bin/gmx 172MiB |
| 2 N/A N/A 1176561 C ...-2020-ubuntu20.04/bin/gmx 172MiB |
| 3 N/A N/A 1176563 C ...-2020-ubuntu20.04/bin/gmx 258MiB |
+-----------------------------------------------------------------------------+
It is also strange that "scontrol show node" seems to list the shards correctly, even in this case:
NodeName=koala Arch=x86_64 CoresPerSocket=14
CPUAlloc=0 CPUEfctv=56 CPUTot=56 CPULoad=22.16
AvailableFeatures=gpu,ht
ActiveFeatures=gpu,ht
Gres=gpu:A5000:3(S:0-1),gpu:RTX5000:1(S:0-1),shard:A5000:72(S:0-1),shard:RTX5000:16(S:0-1)
NodeAddr=10.194.132.190 NodeHostName=koala Version=22.05.7
OS=Linux 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022
RealMemory=1 AllocMem=0 FreeMem=390036 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=urgent,high,medium,low
BootTime=2023-01-03T12:37:17 SlurmdStartTime=2023-01-05T16:24:53
LastBusyTime=2023-01-05T16:37:24
CfgTRES=cpu=56,mem=1M,billing=56,gres/gpu=4,gres/shard=88
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
In all cases, my jobs are submitted with commands like this:
sbatch --gres=shard:12 --wrap 'bash -c " ... (command goes here) ... "'
The behavior is very consistent. I have played around with adding CUDA_DEVICE_ORDER=PCI_BUS_ID to the environment of slurmd and slurmctld,
but it makes no difference.
Is this a bug or a feature?