Hi all,
I'm trying to get working the gathering of gres/gpumem and
gres/gpuutil on Slurm 23.02.2 , but with no success yet.
We have:
AccountingStorageTRES=cpu,mem,gres/gpu
in the slurm.conf and Slurm is build with NVML
support.
Autodetect=NVML
in gres.conf
gres/gpumem and gres/gpuutil now appears in
sacct TRESUsageInAve record, but with zero values:
sacct -j 6056927_51 -Pno TRESUsageInAve
cpu=00:00:07,energy=0,fs/disk=14073059,gres/gpumem=0,gres/gpuutil=0,mem=6456K,pages=0,vmem=7052K
cpu=00:00:00,energy=0,fs/disk=2332,gres/gpumem=0,gres/gpuutil=0,mem=44K,pages=0,vmem=44K
cpu=05:18:51,energy=0,fs/disk=708800,gres/gpumem=0,gres/gpuutil=0,mem=2565376K,pages=0,vmem=2961244K
We are using NVIDIA Tesla V100 and A100 GPUs with driver version
530.30.02. dcgm-exporter is working on the nodes.
Is there anything else needed, to get it working?
Thanks in advanced. Daniel Vecerka