I don't see how that bug is related. That bug is about requiring
the libnvidia-ml.so library for an RPM that was built with NVML
Autodetect enabled. His problem is the opposite - he's already
using NVML autodetect, but wants to disable that feature on a
single node, where it looks like that node isn't using RPMs with
NVML support.
Prentice
-- Prentice Bisbal Lead Software Engineer Research Computing Princeton Plasma Physics Laboratory http://www.pppl.gov
How many nodes are we talking about here? What if you gave each node it's own gres.conf file, where all of them said
AutoDetect=nvml
Except the one you want to exclude, which would have this in gres.conf :
NodeName=a1-10 AutoDetect=off Name=gpu File=/dev/nvidia0
It seems to me like Autodetect and Autodetect=off are exclusive
in the same gres.conf file, but maybe my suggestion would work. If
you have a small number of GPU nodes, or use a configuration
management tool like Ansible, Chef, or Puppet, it might be worth a
shot.
Prentice
-- Prentice Bisbal Lead Software Engineer Research Computing Princeton Plasma Physics Laboratory http://www.pppl.gov
Correction/addendum: If the node you want to exclude has RPMS
that were built without NVML autodetection, you probably want that
gres.conf to look like this:
NodeName=a1-10 Name=gpu File=/dev/nvidia0
I'm guessing if it was built without
Autodetection, the AutoDetect=off option wouldn't be understood,
or would be pointless.
Hardly a expert on GRES configuration, so just spitballing here...
Prentice