[slurm-users] CUDA vs OpenCL

221 views
Skip to first unread message

Valerio Bellizzomi

unread,
Apr 28, 2021, 4:56:48 AM4/28/21
to Slurm Users
Greetings,
I see here https://slurm.schedmd.com/gres.html#GPU_Management that
CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL
GPUs?

Is there an OPENCL_VISIBLE_DEVICES ?


--
Valerio Bellizzomi
https://www.selroc.systems
http://www.selnet.org



Valerio Bellizzomi

unread,
May 6, 2021, 3:21:48 AM5/6/21
to slurm...@lists.schedmd.com
On Wed, 2021-04-28 at 10:56 +0200, Valerio Bellizzomi wrote:
> Greetings,
> I see here https://slurm.schedmd.com/gres.html#GPU_Management that
> CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL
> GPUs?
>
> Is there an OPENCL_VISIBLE_DEVICES ?
>
>


Lack of followup lets me conclude that there isn't an OpenCL equivalent
of CUDA_VISIBLE_DEVICES. It is unfortunate that this open source
software is committed to a single gpu supplier.



Williams, Gareth (IM&T, Black Mountain)

unread,
May 6, 2021, 4:00:57 AM5/6/21
to Slurm User Community List
The post has me thinking so I did a little searching... AMD have an offering that supports OpenCL and they are not NVIDIA. They use a different approach:
https://rocmdocs.amd.com/en/latest/Programming_Guides/Opencl-programming-guide.html#masking-visible-devices
FWIW I did not yet see anything there about cgroups and enforced device visibility/constraints vs playing nicely with environment variables.

For reference, I have no AMD affiliation and little to no direct experience.

It is pretty easy to also find what else supports OpenCL (Wikipedia?). What environment to honor seems to me to mostly be a software choice and most of the software is from vendors, albeit sometimes being open source or using on or relying on open source components or layers.

Gareth

Valerio Bellizzomi

unread,
May 6, 2021, 4:35:39 AM5/6/21
to slurm...@lists.schedmd.com
On Thu, 2021-05-06 at 08:00 +0000, Williams, Gareth (IM&T, Black
Mountain) wrote:
> The post has me thinking so I did a little searching... AMD have an
> offering that supports OpenCL and they are not NVIDIA. They use a
> different approach:
> https://rocmdocs.amd.com/en/latest/Programming_Guides/Opencl-programming-guide.html#masking-visible-devices


Thank you for the pointer. It seems to me that they just name the
variable differently (GPU_DEVICE_ORDINAL) but the approach is the same.


> FWIW I did not yet see anything there about cgroups and enforced
> device visibility/constraints vs playing nicely with environment
> variables.


Here documentation on device cgroups:
https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-System-Managment.html?highlight=device%20cgroups#device-cgroup

Williams, Gareth (IM&T, Black Mountain)

unread,
May 6, 2021, 4:59:24 AM5/6/21
to Slurm User Community List
ROCR_VISIBLE_DEVICES Is the closer analogy. GPU_DEVICE_ORDINAL is in principle more generic (though does have GPU in the name). OpenCL could in principle (can!) run on other devices which could/can have more exotic topology, but for the sake of simplicity are likely to be presented as a list of devices...

Valerio Bellizzomi

unread,
May 6, 2021, 6:00:56 AM5/6/21
to slurm...@lists.schedmd.com
On Thu, 2021-05-06 at 08:58 +0000, Williams, Gareth (IM&T, Black
Mountain) wrote:
> ROCR_VISIBLE_DEVICES Is the closer analogy. GPU_DEVICE_ORDINAL is in
> principle more generic (though does have GPU in the name). OpenCL
> could in principle (can!) run on other devices which could/can have
> more exotic topology, but for the sake of simplicity are likely to be
> presented as a list of devices...
>
> Gareth

Here is a ROCm issue discussion on device selection:
https://github.com/RadeonOpenCompute/ROCm/issues/994

ROCm also has a different way to select devices by serial number using
the rocm-smi interface, this approach is much more reliable than using
device ordinals:
https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-SMI-CLI.html?highlight=showuniqueid
Reply all
Reply to author
Forward
0 new messages