How to use GPU in singularity?

Igor

unread,

Jul 28, 2016, 8:08:19 PM7/28/16

to singularity

Hi All,

I am trying to use GPU-enabled tensorflow and it cannot find GPU card from inside the container.

On the host:

$ lspci | grep -i nvidia
20:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m] (rev a1)
8b:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m] (rev a1)

$ nvidia-smi
Thu Jul 28 19:01:42 2016
+------------------------------------------------------+
| NVIDIA-SMI 346.47     Driver Version: 346.47         |
|-------------------------------+----------------------+----------------------+
| GPU Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0 Tesla K40m          Off | 0000:20:00.0     Off |                    0 |
| N/A   30C    P8    20W / 235W |     66MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1 Tesla K40m          Off | 0000:8B:00.0     Off |                    0 |
| N/A   26C    P8    19W / 235W |     60MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
| GPU       PID Type Process name                               Usage      |
|=============================================================================|
|    0     11671    G   /usr/bin/X                                       9MiB |
|    1     11671    G   /usr/bin/X                                       3MiB |

Inside singularity:

$ singularity shell /software/src/singularity_images/tensorflow_0.9.img

Singularity/tensorflow_0.9.img> lspci | grep -i nvidia
bash: lspci: command not found
Singularity/tensorflow_0.9.img> nvidia-smi
bash: nvidia-smi: command not found
Singularity/tensorflow_0.9.img> python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.Session()
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: midway-l34-01
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: midway-l34-01
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 346.47 Thu Feb 19 18:56:03 PST 2015
GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 346.47.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

Must there be nvidia driver installed inside the container? Outside? The container shares the same kernel with the host and nvidia kernel module needs to be loaded... How this is handled? Any requirements on nvidia driver and cuda versions inside and outside of the container?

Thank you,

Igor

Nathan Lin

unread,

Jul 28, 2016, 8:10:29 PM7/28/16

to singu...@lbl.gov

Hello,

Yes you are correct. The NVIDIA driver must be installed on your image as well. However, you honestly only need the libcuda.so.###.## library and the appropriate links for that library. Once you have those installed in your image, it should work.

Best,

Nathan

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Nathan Lin

unread,

Jul 28, 2016, 8:18:26 PM7/28/16

to singu...@lbl.gov

Also if you are using the binary installation of TensorFlow you need CUDA toolkit 7.5 and cuDNN v4. These only need to be installed on our image. As I mentioned earlier you will need the libcuda.so.###.## library on your image. It is very important that this is the same version of the NVIDIA driver as you have on your nose (seemingly 346.67 for you). I should've have also mentioned that you want the libcuda.so.345.67 library that you get from extracting the NVIDIA installer. It will not work if you try to copy the libcuda.so library that from you node.

Let me know if you have any more questions.

Best,

Nathan

Igor

unread,

Jul 28, 2016, 8:34:55 PM7/28/16

to singularity

Hi Nathan,

When I try to install the driver by running NVIDIA*.run script inside the image, it fails, probably because it tries to modify kernel that belongs to host?

How do I extract just libcuda.so.345.67 without installing the driver (which is obviously problematic) and why would copying the library from the host would not work?

Thank you,

Igor

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Gregory M. Kurtzer

unread,

Jul 28, 2016, 8:48:08 PM7/28/16

to singularity

BTW: That might be a very cool bootstrap/overlay script to provide.... (a script to extract the libs and put into the right place inside the container).

Nathan

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--

Gregory M. Kurtzer
High Performance Computing Services (HPCS)
University of California
Lawrence Berkeley National Laboratory
One Cyclotron Road, Berkeley, CA 94720

Igor

unread,

Jul 28, 2016, 8:51:53 PM7/28/16

to singularity

I mean I am using this file from NVIDIA website cuda_7.5.18_linux.run to install the driver, opengl, cuda. Driver installation fails, cuda succeeds.

Also, when I run

sh cuda_7.5.18_linux.run

I am offered to install the driver version 352.39 while on the host it is 346.47. I cannot upgrade the host. Any idea where I can get 346.17?

I tried using the same link just substitute 18 for something else but have not found the files:

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda_7.5.1X_linux.run

Nathan Lin

unread,

Jul 28, 2016, 10:34:32 PM7/28/16

to singu...@lbl.gov

I am not sure how to find the correct driver version, but from my testing, the version must match exactly. I will admit that I have had problems finding specific versions of the driver from NVIDIA's website. I had to ask a sysadmin for the installer that they used. In order to extract the files, you need to use the --extract-only option. For instance, you will have to run something like ' sh /NVIDIA-Linux-x86_64-352.63.run --extract-only'/ . You will then be given a directory with all the libraries that would have been installed. You will need to copy the libcuda.so.###.## library (and you can copy any NVIDIA executables that you want such as nvidia-smi). Good luck!

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Igor Yakushin

unread,

Jul 31, 2016, 12:17:36 AM7/31/16

to singu...@lbl.gov

Hi Nathan,

I have found exactly the same version of NVIDIA driver and extracted from it the libraries and nvidia executables and copied them in /usr/lib64/nvidia and /usr/bin and created the corresponding symbolic links. However, I still cannot use GPU inside singularity: nvidia-smi says "GPU access blocked by the operating system" (does it work in your case?) and when tensorflow session starts it also complains that "No GPU devices available on machine". However, notice that tensorflow seems to think that a different version of NVIDIA driver is used. Not sure where it is coming from. The machine on which the image was built has version 361.42

============

Python 2.7.12 (default, Jul 1 2016, 15:12:24)

[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally

>>> ss = tensorflow.Session()

E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN

I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: midway230
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: midway230

I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program

I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:347] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.55 Thu Oct 8 15:18:00 PDT 2015
GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.55.0

I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
>>>

Singularity/tensorflow_0.9.img> nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
===========

Thank you,

Igor

Nathan Lin

unread,

Jul 31, 2016, 9:40:04 AM7/31/16

to singu...@lbl.gov

Hi Igor,

I don't necessarily have a great answer for you. If seems like you are doing everything right, yet it is still not working. In my case, yes nvidia-smi as well as TensorFlow both work correctly. I feel like your error still has to do with the version of libcuda.so you are using. Notice how Python seems to correctly load libcuda.so, yet there is later an error that is unable to find libcuda.so. My first suspicion is that there is still a version mismatch between the drivers installed on the image and on the host. If you are sure that is not true, it may be possible that the version of the driver that is installed on the machine isn't new enough for the GPU. That actually occurred on our cluster, and after a sysadmin updated the driver, it worked. Barring that I am not too sure. Maybe if you provide me with the full details of your installation (the versions of the packages that you have installed, the OS of your image and host), I might be able to think about something, but my suspicion is that the driver version on your host machine may not be new enough.

Best,

Nathan

Igor Yakushin

unread,

Jul 31, 2016, 10:36:28 AM7/31/16

to singu...@lbl.gov

Hi Nathan,

When installing cuda libraries and tensorflow into the singularity image, is it important to be on the same host with the same version of CUDA/OS on which you are going to run later?

I do not have root on the machine I am going to run later and prepare the image on a different machine with a different version of nvidia driver and a different flavor of Linux.

Thank you,

Igor

Igor Yakushin

unread,

Jul 31, 2016, 1:51:35 PM7/31/16

to singu...@lbl.gov

Hi Nathan,

I got a little bit further: nvidia-smi is working now but tensorflow still complains:

=========

Singularity/ubuntu_14.04.img> nvidia-smi
Sun Jul 31 17:33:44 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.55     Driver Version: 352.55         |

| N/A 45C P0 79W / 235W | 158MiB / 11519MiB | 45% Default |

+-------------------------------+----------------------+----------------------+
| 1 Tesla K40m Off | 0000:8B:00.0 Off | 0 |

| N/A 23C P8 18W / 235W | 61MiB / 11519MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
| GPU       PID Type Process name                               Usage      |
|=============================================================================|

+-----------------------------------------------------------------------------+
Singularity/ubuntu_14.04.img> python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2

Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> ss = tensorflow.Session()

E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: midway-l34-02
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: midway-l34-02
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 352.93.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 352.55 Thu Oct 8 15:18:00 PDT 2015

GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.55.0

E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 352.55.0 does not match DSO version 352.93.0 -- cannot find working devices in this configuration

I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
>>>

=================

As far as I understand the problem is that cuda-7.5 was built or relies on nvidia 352.93 while I have NVIDIA driver 352.55 both on the host and container. So far I could not find cuda-7.5 built with 352.55.

cuda-7.5 has stabs directory in which there is libcuda.so. The problem is probably coming from there. However, I doubt I can just replace libcuda.so in the stubs directory by a different version or turn it into symbolic link to a different version in the driver? Because its size is much smaller than the size of the real libcuda.so in the driver. So I suspect, it is really only some kind of interface to the real library?

Thank you,

Igor

Nathan Lin

unread,

Jul 31, 2016, 2:26:10 PM7/31/16

to singu...@lbl.gov

Hi Igor,

In regards to your first questions, the OS/drivers of your building computer should not matter. I built an Ubuntu 14.04 image on my RHEL 7 box for our RHEL 6 cluster. I'm not sure that the toolkit is that version specific, my image seems to work fine and it's running 353.63. There is one thing that I do that may be helpful. I read it somewhere online and am not actually sure if it does anything, but I've included it in my image definitions just in case. Apparently there is something about initializing the CUDA Toolkit. As part of my definition file I run 'make' on the CUDA sample 'deviceQuery'. Maybe that will help?

Best,

Nathan

Igor Yakushin

unread,

Jul 31, 2016, 5:03:15 PM7/31/16

to singu...@lbl.gov

Nathan,

When you import tensorflow in python, does it tell you what cuda libraries it is loading or not?

Do you see these messages:

======

>>> import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally

======

Thank you,

Igor

Nathan Lin

unread,

Jul 31, 2016, 6:18:01 PM7/31/16

to singu...@lbl.gov

Yes I do

Igor Yakushin

unread,

Aug 1, 2016, 1:20:11 AM8/1/16

to singu...@lbl.gov

Thank you, Nathan. It finally works!

Rick Wagner

unread,

Aug 1, 2016, 1:44:00 AM8/1/16

to singu...@lbl.gov

Igor,

If you had a chance to post your definition file or the steps you took, I know several of us would appreciate it. Getting TensorFlow running on CentOS was a huge effort for our support staff. And that's just one of many GPU-enabled applications.

--Rick

Nathan Lin

unread,

Aug 1, 2016, 9:56:37 AM8/1/16

to singu...@lbl.gov

That's great to hear Igor! What ended up being the problem?

Igor Yakushin

unread,

Aug 1, 2016, 10:21:48 PM8/1/16

to singu...@lbl.gov

Hi Rick,

0) Don't use *.deb either for NVIDIA or cuda, use *.run.

1) Get the same version of NVIDIA as on the host and extract it (there is an option for that) into a single directory in /usr/local without installing. Also pay attention to get NVIDIA*.run for your hardware. Apparently Tesla cards require different driver than consumer cards.

2) Set both PATH and LD_LIBRARY_PATH to the same directory where NVIDIA is unpacked. After that nvidia-smi should work on the singularity shell.

3) Get latest cuda 7.5 *.run file and install only cuda and not the driver (there is an option for that; by default it would install driver as well which would most likely be a different version than what you need and would conflict the version extracted from NVIDIA file)

4) cuda by default will also be installed into /usr/local; set LD_LIBRARY_PATH to point there

5) install the latest gpu enabled tensorflow with pip getting it from their website

I'll try to put all this into *def file

There might be some important prerequisite *deb files that I installed to make all this work and forgot about them, so I would need to try to reproduce it from scratch to rediscover them.

Thank you,

Igor

Igor Yakushin

unread,

Aug 1, 2016, 10:30:15 PM8/1/16

to singu...@lbl.gov

Hi Nathan,

The main problem was that if you try to install cuda, it would by default install driver as well that might be of different version than the driver installed with NVIDIA*.run file. So when installing cuda, use an option not to install the driver. It is much easier to find NVIDIA*.run file of the version you need than cuda*.run with the right driver. When downloading NVIDIA*.run, pay attention that you are asking for Tesla card (if that's what you have). Consumer cards have different driver (I do not remember if it is reflected in the file name but I suspect not, because I made this mistake).

Thank you,

Igor

Igor Yakushin

unread,

Aug 1, 2016, 10:44:00 PM8/1/16

to singu...@lbl.gov

BTW, Rick: Are there any notes available about how you built singularity images for Comet?

How did you handle TensorFlow?

Rick Wagner

unread,

Aug 1, 2016, 11:02:33 PM8/1/16

to singu...@lbl.gov, Mahidhar Tatineni

Hi Igor,

Mahidhar did the “by hand” deployment of TensorFlow with CUDA onto our CentOS systems. We haven’t actually built Singularity images for this, hence our gratitude for your feedback. This seems like a great tutorial for the web site.

—Rick

Bernard Li

unread,

Aug 2, 2016, 12:55:07 AM8/2/16

to singu...@lbl.gov

Hey Igor:

If you can make the tensorflow Singularity container available, I'd like to try that out on our cluster.

Thanks,

Bernard

Nathan Lin

unread,

Aug 2, 2016, 1:01:19 AM8/2/16

to singu...@lbl.gov

Hi Igor,

Yeah that something that definitely comes up. Looks like you already fixed it, but yeah you should always look at all the options when you're running a silent install. Glad you got it to work though!

Best,

Nathan

Igor Yakushin

unread,

Aug 2, 2016, 1:21:18 AM8/2/16

to singu...@lbl.gov

Hi Bernard,
What nvidia driver version is on your host? What card model?
Thank you,
Igor

Bernard Li

unread,

Aug 2, 2016, 1:53:35 AM8/2/16

to singu...@lbl.gov

Hi Igor:

352.39 and Tesla K80.

Thanks,

Bernard

Gregory M. Kurtzer

unread,

Aug 2, 2016, 10:35:48 AM8/2/16

to singu...@lbl.gov

I think the best way of doing this is to have the host provide the CUDA drivers that properly match the kernel drivers that it has installed on the host to a directory. Then use "bind path" to link in those drivers into the container and set the LD_LIBRARY_PATH in /etc/singularity/init to match that directory.

This feature will be MUCH better when we will be able to link in arbitrary directories into the containers without having to rely on the bind point existing.

Greg

Igor Yakushin

unread,

Aug 2, 2016, 10:58:39 AM8/2/16

to singu...@lbl.gov

Hi Greg,

I got an impression that Nathan was saying that you cannot just copy host NVIDIA drivers. Did I misunderstand it? How do you use "bind path"? Is it a feature of Singularity?

Also, would this solution be portable? Do you mean that this "bind path" happens dynamically as long as the host has some NVIDIA driver and would work with any version without having to rebuild the container?

Thank you,

Igor

Gregory M. Kurtzer

unread,

Aug 2, 2016, 11:05:34 AM8/2/16

to singularity

On Tue, Aug 2, 2016 at 7:58 AM, Igor Yakushin <igor...@gmail.com> wrote:

Hi Greg,
I got an impression that Nathan was saying that you cannot just copy host NVIDIA drivers.

I'm not sure how "copy-able" the drivers are, but I have the impression they are compiled against an older version of libC so they should be relatively portable... I think this is how Nvidia-docker works.

Did I misunderstand it? How do you use "bind path"? Is it a feature of Singularity?

"bind path" is in your configuration file (singularity.conf) and will bind files/directories outside the container to inside.

Also, would this solution be portable? Do you mean that this "bind path" happens dynamically as long as the host has some NVIDIA driver and would work with any version without having to rebuild the container?

That would be the idea yes. It should be as portable as Nvidia's solution (as I understand it) lol. Maybe someone else has more information on this who can chime in.

Thank you,

My pleasure!

Nathan Lin

unread,

Aug 2, 2016, 11:29:48 AM8/2/16

to singu...@lbl.gov

Hi,

From my experiences, attempting to use the NVIDIA drivers on the host was not the most successful approach. Although first of all, I think it is important to distinguish between the CUDA drivers and the NVIDIA drivers. This is made more confusing because the CUDA toolkit wants to install its own version of the NVIDIA drivers, but the point is the CUDA drivers are relatively portable while the NVIDIA drivers are not so much.

The errors I ran into with the NVIDIA driver (specifically the library libcuda.so.352.63) was that there seemed to be a difference between the version of the library that was installed on the cluster and the version of the library I got from extracting the library without installing the driver (using the --extract-only option). I think this has to do with the fact that something is actually compiled when you install the driver, but the point is that the two versions of the library have different dependencies. And it seems that the version on the cluster depends on some libraries that are installed on the cluster. Although I did not mess with my singularity install (with bind path and such) there didn't seem like a good resolution for this because the reason we are using singularity is precisely because we want our image to have a newer version of lib.c. Thus, the ability for our image to get the libraries it needs from the host computer is pretty much nonexistent. Instead, I've set up my image to have multiple version numbers of the libcuda.so library, and depending on the host machine, the image adds the appropriate driver to LD_LIBRARY_PATH. This isn't the greatest solution, but it was the best I could think of.

I hope that was helpful, and good luck!

Best,

Nathan

Igor Yakushin

unread,

Aug 2, 2016, 11:39:28 AM8/2/16

to singu...@lbl.gov

Nathan,

Considering that singularity ignores $HOME/.bashrc and in version 2.1 gives you sh, not bash, by default, how do you make singularity set LD_LIBRARY_PATH automatically? Do you wrap bash or is there some other hooks? On my cluster different nodes have different versions of nvidia driver and different tesla cards. So it would be useful for an image to be able to automatically set LD_LIBRARY_PATH and PATH to point to the correct version of the driver.

Thank you,

Igor

Gregory M. Kurtzer

unread,

Aug 2, 2016, 11:46:47 AM8/2/16

to singularity

Hi Nathan,

My initial brainstorming with Nvidia was along the lines that I described, but my brainstorming was all theoretical (as my personal experience with GPUs is minimal). In theory there is no difference between practice and theory, but in practice there is!

I am curious if there is a problem with the design, my explanation of it, or something different we need to test.

Thanks!

Nathan Lin

unread,

Aug 2, 2016, 11:47:16 AM8/2/16

to singu...@lbl.gov

Hi Igor,

Sorry I was going to include this detail, but I was typing on my phone and got lazy. The setting of LD_LIBRARY_PATH is not really automatic. On our clusters, we have modules and basically when we load the TensorFlow module, it parses the /lib folder for the correct NVIDIA driver version for the computer and stores it as a variable. Then, part of my runscript sets LD_LIBRARY_PATH to include the path to the driver I installed on the image. Not the most elegant solution, but it works. On our cluster, we are trying to have the image be as similar to a python interpreter as possible, so using runscripts works well for us.

Best,

Nathan

Gregory M. Kurtzer

unread,

Aug 2, 2016, 11:49:23 AM8/2/16

to singularity

You can also set any environment variables in your current shell and it will be shared with the container, you can edit it in /etc/singularity/init so they automatically get pushed into all containers, or if the environment variable is container specific you can put it inside the container (with a container bootstrapped with 2.1 or newer) at a file within the container at /environment.

Greg

Nathan Lin

unread,

Aug 2, 2016, 11:50:18 AM8/2/16

to singu...@lbl.gov

Hi Greg,

I think that something along the line of what you brainstormed would work, but it would probably require that the image and the host computer are using the same version of lib.c. At this point, I just can't seem to think about a different way to get around the inherent different in the dependencies of the two versions of the library. But I'm also not familiar with this kind of stuff (I've really only been working with Linux and containers this summer), so maybe there is a solution out there!

Best,

Nathan

Nathan Lin

unread,

Aug 2, 2016, 11:53:08 AM8/2/16

to singu...@lbl.gov

Hi Greg,

Sorry for this email, just saw your recent reply. Can you tell me a bit more about the /environment file? That seems like it would definitely solve one of my major issues I've been running into. Should it be formatted similar to a standard bash profile file?

Thanks!

Nathan

Gregory M. Kurtzer

unread,

Aug 2, 2016, 12:06:17 PM8/2/16

to singularity

Hi Nathan,

This isn't exactly a container issue as opposed to just cross library compatibility. Something that Nvidia reps told me was that they are compiling the Cuda (and other libs) against an older version of libc so using it on a newer distro wouldn't be a problem (e.g. compiled on RHEL5, should work on RHEL7 just fine, but not vise versa). If that is true, then the libraries are "kinda" portable and this is (as I understood) how they are currently doing it for Nvidia-docker. "in theory"... lol

Greg

Gregory M. Kurtzer

unread,

Aug 2, 2016, 12:08:21 PM8/2/16

to singularity

On Tue, Aug 2, 2016 at 8:52 AM, Nathan Lin <nathan...@gmail.com> wrote:

Hi Greg,

Sorry for this email, just saw your recent reply. Can you tell me a bit more about the /environment file? That seems like it would definitely solve one of my major issues I've been running into. Should it be formatted similar to a standard bash profile file?

Containers built/bootstrapped with Singularity v2.1 (or newer) will have the file /environment which will get sourced automatically when invoking the container. You are correct, you can format it similar to your standard bash/shell profile script. In order to be portable, just make sure you follow Bourne semantics (not Bash) so don't do anything fancy. lol

Hope that helps!

Greg

Nathan Lin

unread,

Aug 2, 2016, 12:17:14 PM8/2/16

to singu...@lbl.gov

Hi Greg,

Thanks for the info on /environment! And I understand that the issue is more of a library issue. The thing is, when I ran ldd on the two versions of the libcuda.so.352.63 library I had, the installed version actually had a dependency on a different set of libraries than the other version. Thus, the error I was running into didn't seem to solely different lib.c versions, but rather that there was certain libraries that one version depended on that the other didn't.

I just ran the ldd again and here is what I got for the installed version:

linux-gate.so.1 => (0x55575000)
libdl.so.2 => /lib/libdl.so.2 (0x5634c000)
libpthread.so.0 => /lib/libpthread.so.0 (0x56351000)
libm.so.6 => /lib/libm.so.6 (0x5636c000)
libc.so.6 => /lib/libc.so.6 (0x56396000)
/lib/ld-linux.so.2 (0x55555000)

And here is what I got for the extracted version:

linux-vdso.so.1 => (0x00007fff069f6000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b5fea31c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b5fea520000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00002b5fea73f000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b5fea947000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b5feac4d000)
/lib64/ld-linux-x86-64.so.2 (0x00002b5fe913d000)

Either way, because of these differences, I didn't have much success utilizing the libcuda.so.352.63 that was installed on my host computer. Any ideas why there might be these differences?

Best,

Nathan

Igor Yakushin

unread,

Aug 2, 2016, 11:14:04 PM8/2/16

to singu...@lbl.gov

Bernard,

Do you have the corresponding NVIDIA*.run file for Tesla K80? I could not find it on nvidia site.

Thank you,

Igor

Jason Stover

unread,

Aug 2, 2016, 11:31:00 PM8/2/16

to singu...@lbl.gov

Hi Igor,

If you can dig up the file: cuda_7.5.18_linux.run

That has the 352.39 driver in it.

$ sh ./cuda_7.5.18_linux.run --extract=`pwd`/tmp/
$ ls tmp/
NVIDIA-Linux-x86_64-352.39.run cuda-linux64-rel-7.5.18-19867135.run
cuda-samples-linux-7.5.18-19867135.run

-J

Igor Yakushin

unread,

Aug 3, 2016, 3:56:41 PM8/3/16

to singu...@lbl.gov

Hi Jason,

How do you know that this is for Tesla and not for consumer cards? When one downloads a driver from Nvidia, it appears that those are different drivers. However, they are named the same. Is there really any difference? Is there a way to query the driver to find out which cards it supports?

Thank you,

Igor

Igor Yakushin

unread,

Aug 4, 2016, 10:34:14 AM8/4/16

to singu...@lbl.gov

Hi Rick,

The scripts are attached. The driver script that calls others is build.sh. I found it the easiest to work with Scientific Linux 7, I had some issues with Ubuntu and CentOS containers. Several Nvidia drivers and cuda must be put in the same directory as the scripts for them to work.

Once the image is build, you can choose which driver to use by executing:

source /usr/local/nvidia.sh <driver version>

inside the container. See /usr/local/NVIDIA* which drivers are provided but you can easily extend the script to any drivers.

The container is built under singularity 2.1.

I'll publish the image somewhere later today.

The image contains a lot of other things not needed for TensorFlow.

I tested it on my laptop (GeForce GTX 960M, 361.42) and on a few nodes on the cluster (Tesla K40m, K80 driver versions 352.55, 346.47). At first glance TensorFlow works but I do not know it well enough to try all the features.

One thing that is still not clear to me: how does Nvidia distinguish drivers for different hardware? My guess is by version number since otherwise the names are the same for consumer cards and Tesla but when you try to download a driver from Nvidia some drivers support only Tesla and some only consumer cards.

Igor

Yes I do

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

build.sh

links.sh

nvidia.sh

sl7.def

Rick Wagner

unread,

Aug 4, 2016, 11:20:35 AM8/4/16

to singu...@lbl.gov, Mahidhar Tatineni

Hi Igor,

Thank you so much for taking the time to do this! We’ll test things on Comet right away and help with any feedback we have.

Regarding the drivers, we’ll need to talk with some of our team with in-depth GPU experience and get back to you.

—Rick

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.
<build.sh><links.sh><nvidia.sh><sl7.def>

Jason Stover

unread,

Aug 4, 2016, 11:32:41 AM8/4/16

to singu...@lbl.gov

Hi Igor,

Sorry, I missed this. This is what I'm using with some of our K80's here.

-J

Jason Stover

unread,

Aug 4, 2016, 11:38:38 AM8/4/16

to singu...@lbl.gov

Ugh... and since I'm an idiot, here's card specific info:

# lspci | grep -i 'nvidia'
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
(...)
# lspci -n -s 04:00.0
04:00.0 0302: 10de:102d (rev a1)

# ls -l /usr/lib64/libGL.so.1
lrwxrwxrwx 1 root root 15 Jul 1 06:33 /usr/lib64/libGL.so.1 -> libGL.so.352.39

# nvidia-smi -i 0
Thu Aug 4 10:36:24 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |

Igor Yakushin

unread,

Aug 4, 2016, 11:59:58 AM8/4/16

to singu...@lbl.gov

Hi Jason,

What I meant is given NVIDIA*.run file, how do you know for which card is it? Is the version number different for consumer cards and Tesla or can there be two kinds of drivers of the same version? My guess is that the version numbers are different for consumer cards and Tesla because otherwise the names of the files are the same for my laptop and Tesla but when I try to download a driver from Nvidia, it asks for which card one needs it and apparently some drivers only support Tesla and some drivers only support consumer cards.

Thank you,

Igor

>>> >> >>>>>>>>>>>>>>> it, send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> --
>>> >> >>>>>>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>>>>>> Google Groups "singularity" group.
>>> >> >>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>> >> >>>>>>>>>>> from
>>> >> >>>>>>>>>>> it,

>>> >> >>>>>>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> --
>>> >> >>>>>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>>>>> Google
>>> >> >>>>>>>>>> Groups "singularity" group.
>>> >> >>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>> >> >>>>>>>>>> from
>>> >> >>>>>>>>>> it,

>>> >> >>>>>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>>>> Google
>>> >> >>>>>>>>> Groups "singularity" group.
>>> >> >>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>>>>> it,

>>> >> >>>>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>>>> Google
>>> >> >>>>>>>>> Groups "singularity" group.
>>> >> >>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>>>>> it,

>>> >> >>>>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>> Google
>>> >> >>>>>>> Groups "singularity" group.
>>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>>> it,

>>> >> >>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> You received this message because you are subscribed to the
>>> >> >>>>>>> Google
>>> >> >>>>>>> Groups "singularity" group.
>>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>>> it,

>>> >> >>>>>>> send an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> You received this message because you are subscribed to the
>>> >> >>>>>> Google
>>> >> >>>>>> Groups "singularity" group.
>>> >> >>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>> it,
>>> >> >>>>>> send

>>> >> >>>>>> an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> You received this message because you are subscribed to the
>>> >> >>>>>> Google
>>> >> >>>>>> Groups "singularity" group.
>>> >> >>>>>> To unsubscribe from this group and stop receiving emails from
>>> >> >>>>>> it,
>>> >> >>>>>> send

>>> >> >>>>>> an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> You received this message because you are subscribed to the
>>> >> >>>>> Google
>>> >> >>>>> Groups "singularity" group.
>>> >> >>>>> To unsubscribe from this group and stop receiving emails from it,
>>> >> >>>>> send

>>> >> >>>>> an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> You received this message because you are subscribed to the
>>> >> >>>>> Google
>>> >> >>>>> Groups "singularity" group.
>>> >> >>>>> To unsubscribe from this group and stop receiving emails from it,
>>> >> >>>>> send

>>> >> >>>>> an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>>
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> You received this message because you are subscribed to the Google
>>> >> >>>> Groups "singularity" group.
>>> >> >>>> To unsubscribe from this group and stop receiving emails from it,
>>> >> >>>> send

>>> >> >>>> an email to singularity+unsubscribe@lbl.gov.

>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> You received this message because you are subscribed to the Google
>>> >> >>> Groups
>>> >> >>> "singularity" group.
>>> >> >>> To unsubscribe from this group and stop receiving emails from it,
>>> >> >>> send
>>> >> >>> an

>>> >> >>> email to singularity+unsubscribe@lbl.gov.

>>> >> >>
>>> >> >> --
>>> >> >> You received this message because you are subscribed to the Google
>>> >> >> Groups
>>> >> >> "singularity" group.
>>> >> >> To unsubscribe from this group and stop receiving emails from it,
>>> >> >> send
>>> >> >> an

>>> >> >> email to singularity+unsubscribe@lbl.gov.

>>> >> >
>>> >> > --
>>> >> > You received this message because you are subscribed to the Google
>>> >> > Groups
>>> >> > "singularity" group.
>>> >> > To unsubscribe from this group and stop receiving emails from it,
>>> >> > send
>>> >> > an

>>> >> > email to singularity+unsubscribe@lbl.gov.

>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "singularity" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an

>>> >> email to singularity+unsubscribe@lbl.gov.

>>> >
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "singularity" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an

>>> > email to singularity+unsubscribe@lbl.gov.

>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "singularity" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an

>>> email to singularity+unsubscribe@lbl.gov.

>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "singularity" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Jason Stover

unread,

Aug 4, 2016, 1:35:06 PM8/4/16

to singu...@lbl.gov

Hi Igor,

That I don't know. That run file is from the cuda installer, so I
would hope it supported the Tesla cards. I have installed both the
cuda-7.0 and cuda-7.5 side by side. The 7.5 was installed second since
it was wanted to be default. But I haven't ran into any issues with
the driver not supporting a card unless it was a really old card...

-J

>> >>> >> >>>>>>>>>>>>>>> singularity...@lbl.gov.

>> >>> >> >>>>>>>>>>>
>> >>> >> >>>>>>>>>>> --
>> >>> >> >>>>>>>>>>> You received this message because you are subscribed to
>> >>> >> >>>>>>>>>>> the
>> >>> >> >>>>>>>>>>> Google Groups "singularity" group.
>> >>> >> >>>>>>>>>>> To unsubscribe from this group and stop receiving
>> >>> >> >>>>>>>>>>> emails
>> >>> >> >>>>>>>>>>> from
>> >>> >> >>>>>>>>>>> it,

>> >>> >> >>>>>>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>>
>> >>> >> >>>>>>>>>> --
>> >>> >> >>>>>>>>>> You received this message because you are subscribed to
>> >>> >> >>>>>>>>>> the
>> >>> >> >>>>>>>>>> Google
>> >>> >> >>>>>>>>>> Groups "singularity" group.
>> >>> >> >>>>>>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>>>>>> from
>> >>> >> >>>>>>>>>> it,

>> >>> >> >>>>>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> --
>> >>> >> >>>>>>>>> You received this message because you are subscribed to
>> >>> >> >>>>>>>>> the
>> >>> >> >>>>>>>>> Google
>> >>> >> >>>>>>>>> Groups "singularity" group.
>> >>> >> >>>>>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>>>>> from
>> >>> >> >>>>>>>>> it,

>> >>> >> >>>>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>>>>
>> >>> >> >>>>>>>>> --
>> >>> >> >>>>>>>>> You received this message because you are subscribed to
>> >>> >> >>>>>>>>> the
>> >>> >> >>>>>>>>> Google
>> >>> >> >>>>>>>>> Groups "singularity" group.
>> >>> >> >>>>>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>>>>> from
>> >>> >> >>>>>>>>> it,

>> >>> >> >>>>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>>
>> >>> >> >>>>>>>
>> >>> >> >>>>>>> --
>> >>> >> >>>>>>> You received this message because you are subscribed to the
>> >>> >> >>>>>>> Google
>> >>> >> >>>>>>> Groups "singularity" group.
>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>>> from
>> >>> >> >>>>>>> it,

>> >>> >> >>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>>
>> >>> >> >>>>>>> --
>> >>> >> >>>>>>> You received this message because you are subscribed to the
>> >>> >> >>>>>>> Google
>> >>> >> >>>>>>> Groups "singularity" group.
>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>>> from
>> >>> >> >>>>>>> it,

>> >>> >> >>>>>>> send an email to singularity...@lbl.gov.

>> >>> >> >>>>>>
>> >>> >> >>>>>>
>> >>> >> >>>>>> --
>> >>> >> >>>>>> You received this message because you are subscribed to the
>> >>> >> >>>>>> Google
>> >>> >> >>>>>> Groups "singularity" group.
>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>> from
>> >>> >> >>>>>> it,
>> >>> >> >>>>>> send

>> >>> >> >>>>>> an email to singularity...@lbl.gov.

>> >>> >> >>>>>>
>> >>> >> >>>>>> --
>> >>> >> >>>>>> You received this message because you are subscribed to the
>> >>> >> >>>>>> Google
>> >>> >> >>>>>> Groups "singularity" group.
>> >>> >> >>>>>> To unsubscribe from this group and stop receiving emails
>> >>> >> >>>>>> from
>> >>> >> >>>>>> it,
>> >>> >> >>>>>> send

>> >>> >> >>>>>> an email to singularity...@lbl.gov.

>> >>> >> >>>>>
>> >>> >> >>>>>
>> >>> >> >>>>> --
>> >>> >> >>>>> You received this message because you are subscribed to the
>> >>> >> >>>>> Google
>> >>> >> >>>>> Groups "singularity" group.
>> >>> >> >>>>> To unsubscribe from this group and stop receiving emails from
>> >>> >> >>>>> it,
>> >>> >> >>>>> send

>> >>> >> >>>>> an email to singularity...@lbl.gov.

>> >>> >> >>>>>
>> >>> >> >>>>> --
>> >>> >> >>>>> You received this message because you are subscribed to the
>> >>> >> >>>>> Google
>> >>> >> >>>>> Groups "singularity" group.
>> >>> >> >>>>> To unsubscribe from this group and stop receiving emails from
>> >>> >> >>>>> it,
>> >>> >> >>>>> send

>> >>> >> >>>>> an email to singularity...@lbl.gov.

>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> --
>> >>> >> >>>> You received this message because you are subscribed to the
>> >>> >> >>>> Google
>> >>> >> >>>> Groups "singularity" group.
>> >>> >> >>>> To unsubscribe from this group and stop receiving emails from
>> >>> >> >>>> it,
>> >>> >> >>>> send

>> >>> >> >>>> an email to singularity...@lbl.gov.

>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> --
>> >>> >> >>> You received this message because you are subscribed to the
>> >>> >> >>> Google
>> >>> >> >>> Groups
>> >>> >> >>> "singularity" group.
>> >>> >> >>> To unsubscribe from this group and stop receiving emails from
>> >>> >> >>> it,
>> >>> >> >>> send
>> >>> >> >>> an

>> >>> >> >>> email to singularity...@lbl.gov.

>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> You received this message because you are subscribed to the
>> >>> >> >> Google
>> >>> >> >> Groups
>> >>> >> >> "singularity" group.
>> >>> >> >> To unsubscribe from this group and stop receiving emails from
>> >>> >> >> it,
>> >>> >> >> send
>> >>> >> >> an

>> >>> >> >> email to singularity...@lbl.gov.

>> >>> >> >
>> >>> >> > --
>> >>> >> > You received this message because you are subscribed to the
>> >>> >> > Google
>> >>> >> > Groups
>> >>> >> > "singularity" group.
>> >>> >> > To unsubscribe from this group and stop receiving emails from it,
>> >>> >> > send
>> >>> >> > an

>> >>> >> > email to singularity...@lbl.gov.

>> >>> >>
>> >>> >> --
>> >>> >> You received this message because you are subscribed to the Google
>> >>> >> Groups
>> >>> >> "singularity" group.
>> >>> >> To unsubscribe from this group and stop receiving emails from it,
>> >>> >> send
>> >>> >> an

>> >>> >> email to singularity...@lbl.gov.

>> >>> >
>> >>> >
>> >>> > --
>> >>> > You received this message because you are subscribed to the Google
>> >>> > Groups
>> >>> > "singularity" group.
>> >>> > To unsubscribe from this group and stop receiving emails from it,
>> >>> > send
>> >>> > an

>> >>> > email to singularity...@lbl.gov.

>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> "singularity" group.
>> >>> To unsubscribe from this group and stop receiving emails from it, send
>> >>> an

>> >>> email to singularity...@lbl.gov.

>> >>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "singularity" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an

>> >> email to singularity...@lbl.gov.

>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "singularity" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to singularity...@lbl.gov.

>
>
> --
> You received this message because you are subscribed to the Google Groups
> "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to singularity...@lbl.gov.

Igor Yakushin

unread,

Aug 4, 2016, 3:14:44 PM8/4/16

to singu...@lbl.gov

Hi Bernard,

Here is the container:

https://uchicago.box.com/s/g2dwl6s8awvk96bku5ebifhyi396qd7u

It supports several driver versions.

Once you get into singularity shell, set the environment as follows:

source /usr/local/nvidia.sh 352.39

After that nvidia-smi should work and you can start python and do something like:

import tensorflow as tf

s = tf.Session()

It should detect your card.

Let me know if it works on your cluster. So far I tested it on my laptop and on a few cluster nodes with Tesla K40m with different driver versions.

Thank you,

Igor

Bernard

Yes I do

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Igor Yakushin

unread,

Aug 4, 2016, 4:02:41 PM8/4/16

to singu...@lbl.gov

Forgot to attach one more file:

Yes I do

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

sl7.sh

Jianxiong Dong

unread,

Oct 8, 2016, 7:12:13 PM10/8/16

to singularity

Hi
Based on Igor's work, I created bootstrap definition file for ubuntu tensorflow with GPU support (see: https://github.com/jdongca2003/Tensorflow-singularity-container-with-GPU-support ). Here I used Greg's suggestion to use /environment to export environment variables for GPU driver paths inside the container. To my surprise, I can only do it under /root folder. For other folders, I always got error (file cannot be found). It seems that under root access (sudo), singularity only mounts /root folder (note: For ordinary users, I can always see my current folder inside the container). Any clue?

Thanks

Jianxiong Dong

vanessa s

unread,

Oct 8, 2016, 7:25:12 PM10/8/16

to singu...@lbl.gov

Did you try adding the tensorflow.sh stuffs to the actual bootstrap, when
you would have sudo?

Best,

Vanessa

Jianxiong Dong

unread,

Oct 8, 2016, 7:56:34 PM10/8/16

to singularity

Hi, Vanessa,

> Did you try adding the tensorflow.sh stuffs to the actual bootstrap, when you would have sudo?

No. In build.sh, it contains one line:

singularity exec -w ubuntu_tensorflow_GPU.img sh ./tensorflow.sh

where tensorflow.sh is in current folder of host machine. When I ran it under root account and the current folder was not /root,
I got error: "tensorflow.sh" could not be found. Do you know how to fix it?

Thanks

Jianxiong

vanessa s

unread,

Oct 8, 2016, 8:12:10 PM10/8/16

to singu...@lbl.gov

I think you would need to add the tensorflow.sh (and other dependencies) to the image first. I know we used to have some kind of add command, but I'm not sure we do with the new %post section (note that Greg is actively working on docs and this will come out soon!). What I would do (and this might be silly, but it's worth a go) is to clone your repo in the %post section, and move the files to where you need them in the image. That way, you can continue testing (and tensorflow.sh should be found!)

Just out of curiosity - did you try generating from the tensorflow gpu docker image?

It could be a good base to start with, something like this:

https://github.com/radinformatics/singularity-environments/blob/master/tensorflow/gpu/tensorflow-gpu.def

and then make tweaks to it in %post. At least for regular cpu, we had to change permissions of the wdl files first.

--
You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--

Vanessa Villamia Sochat
Stanford University '16

(603) 321-0676

Jianxiong Dong

unread,

Oct 8, 2016, 9:12:58 PM10/8/16

to singularity

Hi, Venessa,

> Just out of curiosity - did you try generating from the tensorflow gpu docker image?

No. The idea is simple: first build ubuntu contain base image and then install nvidia driver/tensorflow binary packages to update the image.

build.sh
======
rm -f ubuntu_tensorflow_GPU.img
singularity create ubuntu_tensorflow_GPU.img
singularity expand --size 5000 ubuntu_tensorflow_GPU.img
singularity bootstrap ubuntu_tensorflow_GPU.img ubuntu.def

#install nvidia GPU driver/cuda/cudnn/tensorflow

singularity exec -w ubuntu_tensorflow_GPU.img sh ./tensorflow.sh

tensorflow.sh
---------
driver_version=352.93
sh NVIDIA-Linux-x86_64-$driver_version.run -x
mv NVIDIA-Linux-x86_64-$driver_version /usr/local/
sh links.sh $driver_version

sh ./cuda_7.5.18_linux.run --toolkit --silent
tar xvf ./cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local

driver_path=/usr/local/NVIDIA-Linux-x86_64-$driver_version
echo "LD_LIBRARY_PATH=/usr/local/cuda/lib64:$driver_path:$LD_LIBRARY_PATH" >> /environment
echo "PATH=$driver_path:\$PATH" >> /environment
echo "export PATH LD_LIBRARY_PATH" >> /environment

pip install --upgrade pip
pip install matplotlib
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl

> What I would do (and this might be silly, but it's worth a go) is to clone your repo in the %post section, and move the files to where you need them in the image

Thank for your suggestion. I will try it.

Jianxiong

Jianxiong Dong

unread,

Oct 9, 2016, 12:05:27 AM10/9/16

to singularity

Hi, Venessa,

I found a solution and updated it in https://github.com/jdongca2003/Tensorflow-singularity-container-with-GPU-support.

The trick is to bind the current folder into /mnt inside the container based on '-B'

sudo singularity exec -B `pwd`:/mnt -w ubuntu_tensorflow_GPU.img sh /mnt/tensorflow.sh

The downside is that I need to use /mnt in the path.

tensorflow.sh

-----

driver_version=352.93
sh /mnt/NVIDIA-Linux-x86_64-$driver_version.run -x

Maybe Greg can provide the best practice to do it.

Thanks

Jianxiong

Stack Kororā

unread,

Oct 27, 2016, 12:13:50 PM10/27/16

to singularity

Greetings,

Fairly new to Singularity but slowly learning. Running the latest out of Git on a Scientific Linux 6 cluster. Basic apps, I have already got working. GPU's? I am struggling. Funny enough, I also am trying to get Tensorflow to work.

If I use Igor's scripts to build, I get this:

Bootstrap initialization
Checking bootstrap definition
Executing Prebootstrap module
Executing Bootstrap 'yum' module
Found YUM at: /usr/bin/yum
Setting up Install Process
base   | 3.7 kB     00:00     
base/primary_db   | 4.7 MB     00:00     
Error: xz compression not available
ERROR: Bootstrap failed... exiting
ERROR: Aborting with RETVAL=255

I have no idea what is wrong there....But even the most basic RHEL7 examples I can find fail. So I am wondering if I have something wrong in my environment, or if there is an issue with RHEL7 on RHEL6.

Then I tried Jianxiong's Git repo for Ubuntu. However, I had to make significant changes as the def version is the "old" one and it throws errors. I updated it to the "new" def format first. We are already using a newer driver and the CUDA 8 on the host cluster, so I updated the scripts accordingly. The scripts all worked till they got to pip. Then they failed. I couldn't get the pip commands to work in the singularity container until I went to python3 and pip3. That works better anyway as my user base is asking for python3.5. After that the scripts worked. I updated the link to the proper tensor flow version and it failed with:

tensorflow-0.11.0rc1-cp35-cp35m-linux_x86_64.whl is not a supported wheel on this platform.

I am now at the spot where I can't seem to get tensorflow to install no matter what I do.

I am going to try one more time with the xenial build instead of trusty. Any other thoughts on something that I can try?

Thanks!

David Godlove

unread,

Oct 27, 2016, 12:22:54 PM10/27/16

to singularity

As long as you are throwing things at the wall to see what sticks... :-) You might have a look here:

https://hpc.nih.gov/apps/singularity.html#gpu

It's a guide I wrote for our users to get a GPU up and running with a TensorFlow example. Might not work exactly in your case b/c it is specific to our hardware. But you might be able to tweak things a little to get up and running.

Tru Huynh

unread,

Oct 27, 2016, 12:43:17 PM10/27/16

to singu...@lbl.gov

On Thu, Oct 27, 2016 at 09:22:53AM -0700, David Godlove wrote:
> As long as you are throwing things at the wall to see what sticks... :-)
> You might have a look here:
>
> https://hpc.nih.gov/apps/singularity.html#gpu
>

Here is my WIP recipe :)

tf-0.11.0rc1-gpu.def:
# docker bootstrap

Bootstrap: docker
From: tensorflow/tensorflow:0.11.0rc1-gpu
Registry: gcr.io
Token: no

IncludeCmd: yes

%runscript
echo "This is what happens when you run the container..."
export PATH=/usr/local/bin:$PATH
%post
apt-get update && apt-get -y upgrade
# my host is using 367.57 for cud8 support
curl -k -L -o /tmp/NVIDIA-driver.run http://us.download.nvidia.com/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run
sh /tmp/NVIDIA-driver.run --silent --no-kernel-module --no-x-check --no-install-compat32-libs && /bin/rm /tmp/NVIDIA-driver.run
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key F60F4B3D7FA2AF80

tf-0.11.0rc1-gpu.sh:
#!/bin/sh
IMG_DIR=/home/tru/singularity-img
DEF_DIR=/home/tru/singularity.d
O=`basename $0 .sh`
IMG_FILE=${IMG_DIR}/${O}.img
IMG_DEF=${DEF_DIR}/${O}.def
IMG_LOG=${DEF_DIR}/${O}.log
echo ${IMG_FILE}
\rm -f ${IMG_FILE}
sudo singularity create --size 3600 ${IMG_FILE}
sudo singularity bootstrap ${IMG_FILE} ${IMG_DEF} 2>&1 | tee ${IMG_LOG}

It will create the container from the docker image and put the drivers inside.

Works here on CentOS-6 host, ymmv.

Cheers

Tru
--
Dr Tru Huynh | http://www.pasteur.fr/research/bis
mailto:t...@pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France

Stack Kororā

unread,

Oct 27, 2016, 2:01:16 PM10/27/16

to singularity

Thanks David and Tru!

I picked a few tidbits out of both your configs plus using xenial instead of trusty, and it works!!! Whoooo!! I couldn't seem to get either to work by themselves though as they use the old configuration and my version of singularity gets fussy with the old configuration. I have to use the new one.

Now I have to figure out how to clean it up....

I can *not* get it to keep my environmental settings. Even after editing /environment I have to still set the LD_LIBRARY_PATH and PATH.

Also, I am not sure how to best pass in a python script for it to execute. Ultimately the users are going to have to interact with it by SLURM submission and I would prefer it not be an interactive one.

But these are the things to figure out as I learn. :-P

Thanks!

Stack Kororā

unread,

Oct 27, 2016, 2:37:25 PM10/27/16

to singularity

Editing /singularity inside the container is letting me run events (including setting up the environment).

However, I am not getting it to pass the file along per the documentation here: http://singularity.lbl.gov/docs-run

exec /usr/bin/python3

"%@"

/usr/bin/python3: can't open file '%@': [Errno 2] No such file or directory

But I am getting closer!

Jason Stover

unread,

Oct 27, 2016, 3:30:00 PM10/27/16

to singu...@lbl.gov

Isn't the %runscript supposed to be shell syntax? So that should be:

exec /usr/bin/python3 "$@"

??

-J

> --
> You received this message because you are subscribed to the Google Groups
> "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to singularity...@lbl.gov.

Stack Kororā

unread,

Oct 27, 2016, 3:42:55 PM10/27/16

to singularity

On Thursday, October 27, 2016 at 2:30:00 PM UTC-5, Jason Stover wrote:

Isn't the %runscript supposed to be shell syntax? So that should be:

exec /usr/bin/python3 "$@"

??

BWAAAAHAHAHAAHAHAHAHAHA!!!

That is what I get for cutting/pasting out of the documentation. If I had stopped to think about what the error was telling me, I could have figured that out. But nope, I went and was poking around in other things...

Thanks Jason. I appreciate it.

Looks like the documentation needs to have a typo corrected : http://singularity.lbl.gov/docs-run
:-)

Gregory M. Kurtzer

unread,

Oct 27, 2016, 8:21:42 PM10/27/16

to singularity

Haha, good catch on the documentation issue. Fixed with apologies!

Greg

--

You received this message because you are subscribed to the Google Groups "singularity" group.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--

Gregory M. Kurtzer

HPC Systems Architect and Technology Developer

Lawrence Berkeley National Laboratory HPCS
University of California Berkeley Research IT
Singularity Linux Containers (http://singularity.lbl.gov/)

Warewulf Cluster Management (http://warewulf.lbl.gov/)

GitHub: https://github.com/gmkurtzer, Twitter: https://twitter.com/gmkurtzer

Tyler Trafford

unread,

Nov 3, 2016, 2:44:52 PM11/3/16

to singularity

I just noticed this thread. What I have working is to start with a Docker imported image, and then run (with overlays enabled):

singularity exec -B /usr/lib64/libcuda.so.$(/sbin/modinfo -F version nvidia):/usr/lib/libcuda.so.1 -B /usr/lib64/libnvidia-fatbinaryloader.so.$(/sbin/modinfo -F version nvidia):/usr/lib/libnvidia-fatbinaryloader.so.$(/sbin/modinfo -F version nvidia) project/tensorflow.img ipython

-Tyler

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

vanessa s

unread,

Nov 3, 2016, 3:16:31 PM11/3/16

to singu...@lbl.gov

We have a user that is getting this to work nicely on Sherlock, in case this helps!

https://github.com/drorlab/tf-singularity

aka, it's just what you are saying - using the Docker bootstrap to handle the hard stuffs :)

To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

Reply all

Reply to author

Forward