Hi All,
We have a Bright Computing cluster running RHEL 7.4. We are running Bright-packaged singularity 2.4.2 and CUDA 9.0 Toolkit (from which our nvidia-smi comes).
This binary lives in a nonstandard location: /cm/local/apps/cuda-driver/lib/current/bin (likewise, CUDA libs liver under /cm/local/apps/ as well).
When we try to run using "singularity run --nv", either by first building a Singularity image then running it, or running the Docker image "on the fly", we get a "no nvidia-smi" error as shown below:
$ singularity build tensorflow_xxx.img docker://
reg.xxxx.com:5000/tensorflow_xxx:1cedc37_2018-01-13
pbt $ singularity run --nv tensorflow_xxx.img
which: no nvidia-smi in (/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbin)
WARNING: Could not find the Nvidia SMI binary to bind into container
...
We do bind the path "/cm/local/apps/cuda-driver" into the container using /etc/singularity/singularity.conf. Also, we set SINGULARITYENV_PATH in /etc/singularity/init to be set to include the path to nvidia-smi.
One can see from debug output (singularity --debug run --nv), that:
- the 'nvidia-smi not found' occurs very early in the output.
- later in the debug output, one sees:
DEBUG [U=35035,P=18620] singularity_runtime_environment() Evaluating envar to clean: SINGULARITYENV_PATH=/cm/local/apps/cuda/libs/current/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
...
DEBUG [U=35035,P=18620] singularity_runtime_
environment() Converting envar 'SINGULARITYENV_PATH' to 'PATH' = '/cm/local/apps/cuda/libs/current/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin'
so it appears that singularity is "trying" to set PATH. However, one can verify (once the container gets to a prompt) that PATH is just set to the standard "/bin:/sbin:/usr/
bin:/usr/sbin:/usr/local/bin:/usr/local/sbin".
If I link or copy nvidia-smi to /usr/local/bin/nvidia-smi, then I don't see the problem. Any ideas what to check here? Is there perhaps a bug in singularity when it comes to setting PATH, at least when using the --nv option?
Thanks,
Keith