[slurm-users] How to view GPU indices of the completed jobs?

975 views
Skip to first unread message

Kota Tsuyuzaki

unread,
Jun 4, 2020, 3:58:48 AM6/4/20
to slurm...@lists.schedmd.com
Hello Guys,

We are running GPU clusters with Slurm and SlurmDBD (version 19.05 series) and some of GPUs seemed to get troubles for attached
jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.

In my understanding `scontrol show job` can show the indices (as IDX in gres info) but cannot be used for completed job. And also
`sacct -j` is available for complete jobs but won't print the indices.

Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?

Best regards,

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsu...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------





sathish

unread,
Jun 8, 2020, 10:07:48 AM6/8/20
to Slurm User Community List
Using sacct you can find those information, try the below options and see if that works.

sacct -j <job id>  --format=jobid,ReqTRES%50,ReqGres
--
Regards.....
Sathish

Kota Tsuyuzaki

unread,
Jun 9, 2020, 10:37:25 PM6/9/20
to Slurm User Community List
> Using sacct you can find those information, try the below options and see if that works.
>
> sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres

Thanks, I tried that command but it looks to show the requested number of GPUs instead of the GPU index. I tried ` sacct -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.

Do I have to turn on some configurations to track the detailed GPU information? Am I missing something?

Best regards,

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsu...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------


> -----Original Message-----
> From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of sathish
> Sent: Monday, June 8, 2020 11:07 PM
> To: Slurm User Community List <slurm...@lists.schedmd.com>
> Subject: Re: [slurm-users] How to view GPU indices of the completed jobs?
>
> Using sacct you can find those information, try the below options and see if that works.
>
> sacct -j <job id> --format=jobid,ReqTRES%50,ReqGres
>
>
> On Thu, Jun 4, 2020 at 1:30 PM Kota Tsuyuzaki <kota.tsu...@hco.ntt.co.jp
> <mailto:kota.tsu...@hco.ntt.co.jp> > wrote:
>
>
> Hello Guys,
>
> We are running GPU clusters with Slurm and SlurmDBD (version 19.05 series) and some of GPUs seemed to get
> troubles for attached
> jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
>
> In my understanding `scontrol show job` can show the indices (as IDX in gres info) but cannot be used for
> completed job. And also
> `sacct -j` is available for complete jobs but won't print the indices.
>
> Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?
>
> Best regards,
>
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsu...@hco.ntt.co.jp <mailto:kota.tsu...@hco.ntt.co.jp>

Kota Tsuyuzaki

unread,
Jun 10, 2020, 2:57:28 AM6/10/20
to Slurm User Community List
> -j <job id> -l` too. However, it seems to include any GPU index information even in AllocGres and AllocTres columns.

It DOES NOT seem to include any GPU index, I meant. Sorry.

Best.

Michael Di Domenico

unread,
Jun 10, 2020, 1:35:18 PM6/10/20
to Slurm User Community List
I don't know the answer, but have you checked the SQL tables in the
database to see if the data you want is even being kept? its possible
slurm is just throwing that value away. (i agree it would be nice if
it was retrievable)

David Braun

unread,
Jun 10, 2020, 2:50:24 PM6/10/20
to Slurm User Community List
Hi Kota,

This is from the job template that I give to my users:

# Collect some information about the execution environment that may
# be useful should we need to do some debugging.  

echo "CREATING DEBUG DIRECTORY"
echo

mkdir .debug_info
module list > .debug_info/environ_modules 2>&1
ulimit -a > .debug_info/limits 2>&1
hostname > .debug_info/environ_hostname 2>&1
env |grep SLURM > .debug_info/environ_slurm 2>&1
env |grep OMP |grep -v OMPI > .debug_info/environ_omp 2>&1
env |grep OMPI > .debug_info/environ_openmpi 2>&1
env > .debug_info/environ 2>&1

if [ ! -z ${CUDA_VISIBLE_DEVICES+x} ]; then
        echo "SAVING CUDA ENVIRONMENT"
        echo
        env |grep CUDA > .debug_info/environ_cuda 2>&1
fi

You could add something like this to one of the SLURM prologs to save the GPU list of jobs.

Best,

David

Kota Tsuyuzaki

unread,
Jun 11, 2020, 10:23:32 PM6/11/20
to Slurm User Community List
Thank you David! Let me try it.
Thinking about our case, I'll try to dump the debug info to somewhere like syslog. Anyway, the idea should be useful to improve our system monitoring. Much appreciated.

Best,
Kota

--------------------------------------------
露崎 浩太 (Kota Tsuyuzaki)
kota.tsu...@hco.ntt.co.jp
NTTソフトウェアイノベーションセンタ
分散処理基盤技術プロジェクト
0422-59-2837
---------------------------------------------

> -----Original Message-----
> From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of David Braun
> Sent: Thursday, June 11, 2020 3:50 AM
> To: Slurm User Community List <slurm...@lists.schedmd.com>
> kota.tsu...@hco.ntt.co.jp <mailto:kota.tsu...@hco.ntt.co.jp>
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
>
>
>
>
>
>




Marcus Wagner

unread,
Jun 16, 2020, 8:17:20 AM6/16/20
to slurm...@lists.schedmd.com
Hi David,

if I remember right, if you use cgroups, CUDA_VISIBLE_DEVICES always
starts from zero. So this is NOT the index of the GPU.

Just verified it:
$> nvidia-smi
Tue Jun 16 13:28:47 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version:
10.2 |
...
+-----------------------------------------------------------------------------+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|
|=============================================================================|
| 0 17269 C gmx_mpi
679MiB |
| 1 19246 C gmx_mpi
513MiB |
+-----------------------------------------------------------------------------+

$> squeue -w nrg04
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
14560009 c18g_low egf5 bk449967 R 1-00:17:48 1 nrg04
14560005 c18g_low egf1 bk449967 R 1-00:20:23 1 nrg04


$> scontrol show job -d 14560005
...
Socks/Node=* NtasksPerN:B:S:C=24:0:*:* CoreSpec=*
Nodes=nrg04 CPU_IDs=0-23 Mem=93600 GRES_IDX=gpu(IDX:0)

$> scontrol show job -d 14560009
JobId=14560009 JobName=egf5
...
Socks/Node=* NtasksPerN:B:S:C=24:0:*:* CoreSpec=*
Nodes=nrg04 CPU_IDs=24-47 Mem=93600 GRES_IDX=gpu(IDX:1)

From the PIDs from nvidia-smi ouput:

$> xargs --null --max-args=1 echo < /proc/17269/environ | grep CUDA_VISIBLE
CUDA_VISIBLE_DEVICES=0

$> xargs --null --max-args=1 echo < /proc/19246/environ | grep CUDA_VISIBLE
CUDA_VISIBLE_DEVICES=0


So this is only a way to see how MANY devices were used, not which.


Best
Marcus
> <mailto:kota.tsu...@hco.ntt.co.jp>> wrote:
>
> Hello Guys,
>
> We are running GPU clusters with Slurm and SlurmDBD (version 19.05
> series) and some of GPUs seemed to get troubles for attached
> jobs. To investigate if the troubles happened on the same GPUs, I'd
> like to get GPU indices of the completed jobs.
>
> In my understanding `scontrol show job` can show the indices (as IDX
> in gres info) but cannot be used for completed job. And also
> `sacct -j` is available for complete jobs but won't print the indices.
>
> Is there any way (commands, configurations, etc...) to see the
> allocated GPU indices for completed jobs?
>
> Best regards,
>
> --------------------------------------------
> 露崎 浩太 (Kota Tsuyuzaki)
> kota.tsu...@hco.ntt.co.jp <mailto:kota.tsu...@hco.ntt.co.jp>
> NTTソフトウェアイノベーションセンタ
> 分散処理基盤技術プロジェクト
> 0422-59-2837
> ---------------------------------------------
>
>
>
>
>

--
Dipl.-Inf. Marcus Wagner

IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

Kota Tsuyuzaki

unread,
Jun 22, 2020, 10:52:19 PM6/22/20
to Slurm User Community List
> if I remember right, if you use cgroups, CUDA_VISIBLE_DEVICES always
> starts from zero. So this is NOT the index of the GPU.

Thanks. Just FYI, when I tested the environment variables with Slurm 19.05.2 + proctrack/cgroup configuration, It looks CUDA_VISIBLE_DEVICES fits the indices on the host devices (i.e. not started from zero). I'm not sure if the behavior would be changed in the newer Slurm version though.

I also found that SLURM_JOB_GPUS and GPU_DEVICE_ORDIGNAL was set in environment variables that can be useful. In my current tests, those variables ware being same values with CUDA_VISILE_DEVICES.

Any advices on what I should look for, is always welcome..

Best,
Kota

Marcus Wagner

unread,
Jun 23, 2020, 9:02:47 AM6/23/20
to slurm...@lists.schedmd.com
Hi Kota,

thanks for the hint.

Yet, I'm still a little bit astonished, as if I remember right,
CUDA_VISIBLE_DEVICES in a cgroup always start from zero. That has been
already years ago, as we still used LSF.

But SLURM_JOB_GPUS seems to be the right thing:

same node, two different users (and therefore jobs)


$> xargs --null --max-args=1 echo < /proc/32719/environ | egrep "GPU|CUDA"
SLURM_JOB_GPUS=0
CUDA_VISIBLE_DEVICES=0
GPU_DEVICE_ORDINAL=0

$> xargs --null --max-args=1 echo < /proc/109479/environ | egrep "GPU|CUDA"
SLURM_MEM_PER_GPU=6144
SLURM_JOB_GPUS=1
CUDA_VISIBLE_DEVICES=0
GPU_DEVICE_ORDINAL=0
CUDA_ROOT=/usr/local_rwth/sw/cuda/10.1.243
CUDA_PATH=/usr/local_rwth/sw/cuda/10.1.243
CUDA_VERSION=101

SLURM_JOB_GPU differs

$> scontrol show -d job 14658274
...
Nodes=nrg02 CPU_IDs=24 Mem=8192 GRES_IDX=gpu:volta(IDX:1)

$> scontrol show -d job 14673550
...
Nodes=nrg02 CPU_IDs=0 Mem=8192 GRES_IDX=gpu:volta(IDX:0)



Is there anyone out there, who can confirm this besides me?


Best
Marcus

Taras Shapovalov

unread,
Jun 23, 2020, 9:42:09 AM6/23/20
to Slurm User Community List
Hi Marcus,

This may depend on ConstrainDevices in cgroups.conf. I guess it is set to "no" in your case.

Best regards,
Taras

Marcus Wagner

unread,
Jun 24, 2020, 1:06:30 AM6/24/20
to slurm...@lists.schedmd.com
Hi Taras,

no we have set ConstrainDevices to "yes".
And this is, why CUDA_VISIBLE_DEVICES starts from zero.

Otherwise both below mentioned jobs would have been on one GPU, but as
nvidia-smi shows clearly (did not show the output this time, see earlier
post), both GPUs are used, environment of both jobs includes
CUDA_VISIBLE_DEVICES=0.

Kota, might it be, that you did not configure ConstrainDevices in
cgroup.conf? The default is "no" according to the manpage.
That way, a user could set CUDA_VISIBLE_DEVICES in his job and therefore
use GPUs they did not request.

Best
Marcus
> >>> <mailto:kota.tsu...@hco.ntt.co.jp
> <mailto:kota.tsu...@hco.ntt.co.jp>>> wrote:
> >>>
> >>>      Hello Guys,
> >>>
> >>>      We are running GPU clusters with Slurm and SlurmDBD
> (version 19.05
> >>>      series) and some of GPUs seemed to get troubles for attached
> >>>      jobs. To investigate if the troubles happened on the same
> GPUs, I'd
> >>>      like to get GPU indices of the completed jobs.
> >>>
> >>>      In my understanding `scontrol show job` can show the
> indices (as IDX
> >>>      in gres info) but cannot be used for completed job. And also
> >>>      `sacct -j` is available for complete jobs but won't print
> the indices.
> >>>
> >>>      Is there any way (commands, configurations, etc...) to see the
> >>>      allocated GPU indices for completed jobs?
> >>>
> >>>      Best regards,
> >>>
> >>>      --------------------------------------------
> >>>      露崎 浩太 (Kota Tsuyuzaki)
> >>> kota.tsu...@hco.ntt.co.jp
> <mailto:kota.tsu...@hco.ntt.co.jp>
> <mailto:kota.tsu...@hco.ntt.co.jp
> <mailto:kota.tsu...@hco.ntt.co.jp>>
> >>>      NTTソフトウェアイノベーションセンタ
> >>>      分散処理基盤技術プロジェクト
> >>>      0422-59-2837
> >>>      ---------------------------------------------
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Dipl.-Inf. Marcus Wagner
> >>
> >> IT Center
> >> Gruppe: Systemgruppe Linux
> >> Abteilung: Systeme und Betrieb
> >> RWTH Aachen University
> >> Seffenter Weg 23
> >> 52074 Aachen
> >> Tel: +49 241 80-24383
> >> Fax: +49 241 80-624383
> >> wag...@itc.rwth-aachen.de <mailto:wag...@itc.rwth-aachen.de>
> >> www.itc.rwth-aachen.de <http://www.itc.rwth-aachen.de>
> >>
> >> Social Media Kanäle des IT Centers:
> >> https://blog.rwth-aachen.de/itc/
> >> https://www.facebook.com/itcenterrwth
> >> https://www.linkedin.com/company/itcenterrwth
> >> https://twitter.com/ITCenterRWTH
> >> https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
> >
> >
> >
> >
>
> --
> Dipl.-Inf. Marcus Wagner
>
> IT Center
> Gruppe: Systemgruppe Linux
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wag...@itc.rwth-aachen.de <mailto:wag...@itc.rwth-aachen.de>
> www.itc.rwth-aachen.de <http://www.itc.rwth-aachen.de>

Stephan Roth

unread,
Jun 26, 2020, 9:19:44 AM6/26/20
to slurm...@lists.schedmd.com
In regard to Kota's initial question

... "Is there any way (commands, configurations, etc...) to see the
allocated GPU indices for completed jobs?" ...

I was in need of the same kind of information and found the following:

If

- ConstrainDevices is on
- SlurmdDebug is set to at least "debug"

The device number from the nvidia kernel driver can be found by grepping
for job id (1143) and "Allowing access to device" in slurmd.log on a GPU
node:

[2020-06-25T20:51:47.219] [1143.0] debug: Allowing access to device c
195:0 rwm(/dev/nvidia0) for job
[2020-06-25T20:51:47.220] [1143.0] debug: Allowing access to device c
195:0 rwm(/dev/nvidia0) for step

As far as I observed the device numnber matches the PCI device minor
number. If anyone can confirm the inner workings of the (proprietary)
kernel module, I'd be glad to know.

The rest of the information I needed I got from
/proc/driver/nvidia/gpus/<PCI BUS location>/information.

Cheers,
Stephan
Reply all
Reply to author
Forward
0 new messages