[slurm-users] Heterogeneous GPU Node?

161 views
Skip to first unread message

Jason Simms

unread,
Jun 23, 2022, 3:50:51 PM6/23/22
to Slurm User Community List
Hello all,

Slightly OT, but I'm hoping the hive mind here can share some advice.

We have a GPU node with three RTX8000 GPUs installed. The node has a capacity of 8 cards in total. I have a researcher who possibly wants to add an A100. I recall asking our vendor a while back whether it's possible (or advisable) to add that card to the existing node, which would result in a heterogeneous mix of GPUs in a single system. They indicated that it's not recommended to do so, but I'm wondering whether anyone has direct experience with this.

And, apropos of this list, if it's fine to move forward with this, are there any Slurm configuration issues I should be aware of?

Warmest regards,
Jason

--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632

Kamil Wilczek

unread,
Jun 23, 2022, 4:41:10 PM6/23/22
to Slurm User Community List, Jason Simms
Hello,

we have both homogeneous and heterogeneous GPU servers and all of them
work without problems. We have mixed GTX 1080 Ti, Titan V and Titan X,
but not the more powerful cards (we have only few of them and they
are working in the same machine).

If the server has an adequate cooling, enough PCI-E lanes (I do not
have experience with NVLink) and power supply with enough power
connectors, you should not see any hardware related issues. The cards
should not have a "gaming" build with large fans on the flat side of the
GPU, but front-to-back airflow which will be consistent with the airflow
in the server.

From the software point of view, the nVidia driver should support
both cards. In the Slurm configuration, I marked them as
different GRES:

# gres.conf

Name=gpu Type=1080ti File=/dev/nvidia0
Name=gpu Type=1080ti File=/dev/nvidia1
Name=gpu Type=titanv File=/dev/nvidia2
Name=gpu Type=titanv File=/dev/nvidia3
Name=gpu Type=titanv File=/dev/nvidia4
Name=gpu Type=titanv File=/dev/nvidia5
Name=gpu Type=titanv File=/dev/nvidia6
Name=gpu Type=titanv File=/dev/nvidia7

# slurm.conf

NodeName=... NodeAddr=... CPUs=40 Gres=gpu:1080ti:2,gpu:titanv:6 ...

I'm not aware of any side effects of that setup. If there are any,
I would also like to know about them :)

Kind Regards
--


W dniu 23.06.2022 o 21:50, Jason Simms pisze:
> Hello all,
>
> Slightly OT, but I'm hoping the hive mind here can share some advice.
>
> We have a GPU node with three RTX8000 GPUs installed. The node has a
> capacity of 8 cards in total. I have a researcher who possibly wants to
> add an A100. I recall asking our vendor a while back whether it's
> possible (or advisable) to add that card to the existing node, which
> would result in a heterogeneous mix of GPUs in a single system. They
> indicated that it's not recommended to do so, but I'm wondering whether
> anyone has direct experience with this.
>
> And, apropos of this list, if it's fine to move forward with this, are
> there any Slurm configuration issues I should be aware of?
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632

--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
OpenPGP_signature

Kamil Wilczek

unread,
Jun 23, 2022, 4:47:46 PM6/23/22
to Slurm User Community List, Jason Simms
I forgot about RTX 2080 Ti, they work fine with Titan X:

Name=gpu Type=rtx2080ti File=/dev/nvidia0
Name=gpu Type=rtx2080ti File=/dev/nvidia1
Name=gpu Type=titanx File=/dev/nvidia2
Name=gpu Type=titanx File=/dev/nvidia3
Name=gpu Type=titanx File=/dev/nvidia4
Name=gpu Type=titanx File=/dev/nvidia5
Name=gpu Type=titanx File=/dev/nvidia6
Name=gpu Type=titanx File=/dev/nvidia7

--

W dniu 23.06.2022 o 22:40, Kamil Wilczek pisze:
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
Laboratorium Komputerowe
Wydział Matematyki, Informatyki i Mechaniki
Uniwersytet Warszawski

ul. Banacha 2
02-097 Warszawa

Tel.: 22 55 44 392
https://www.mimuw.edu.pl/
https://www.uw.edu.pl/
OpenPGP_signature
Reply all
Reply to author
Forward
0 new messages