Tensorflow Docker in Singularity Doesn't See GPU

Bharath Ramsundar

unread,

Dec 25, 2016, 2:39:42 AM12/25/16

to singularity

Hi,

I'm attempting to run Tensorflow within singularity on an Ubuntu 16.04 machine. I'm using the Tensorflow docker to bootstrap (I've been using https://github.com/drorlab/tf-singularity and https://github.com/jdongca2003/Tensorflow-singularity-container-with-GPU-support as my guides). I'm able to successfully build the singularity image and run tensorflow within it, but only with CPU. I've checked that I'm using the same NVIDIA driver version and CUDA versions in singularity as on the machine, so I don't think that's the issue. I'm able to use nvidia-smi from within the image to view the GPUs, but just not able to access the GPU from tensorflow. I've listed my output below, along with the code I used to generate the image. Any advice would be much appreciated :-)

rbharath:~/Tensorflow-singularity-container-with-GPU-support$ sudo singularity shell -w ubuntu_tensorflow_GPU.img
Singularity: Invoking an interactive shell within container...

Singularity.ubuntu_tensorflow_GPU.img> # nvidia-smi
Sun Dec 25 07:35:43 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 On | N/A |
| 27% 37C P8 12W / 180W | 62MiB / 8110MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 27% 34C P8 11W / 180W | 1MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3509 G /usr/lib/xorg/Xorg 60MiB |
+-----------------------------------------------------------------------------+
Singularity.ubuntu_tensorflow_GPU.img> # ldconfig -p | grep libcuda
libcudart.so.8.0 (libc6,x86-64) => /usr/local/cuda/lib64/libcudart.so.8.0
libcudart.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudart.so
libcuda.so.1 (libc6,x86-64) => /usr/local/NVIDIA-Linux-x86_64-367.57/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/local/NVIDIA-Linux-x86_64-367.57/libcuda.so
Singularity.ubuntu_tensorflow_GPU.img> # ipython
/usr/local/lib/python2.7/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/rbharath' is not a writable location, using a temp directory.
" using a temp directory.".format(parent))
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: import tensorflow as tf

In [2]: tf.Session(config=tf.ConfigProto(log_device_placement=True))
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:

Out[2]: <tensorflow.python.client.session.Session at 0x7f97c853b210>

For completeness, I've listed below the code I use to build the image. The entrypoint is running

sh build.sh

############# build.sh
echo "Removing old GPU image"
sudo rm -f ubuntu_tensorflow_GPU.img
echo "Creating GPU image"
sudo singularity create -s 5000 ubuntu_tensorflow_GPU.img
echo "Bootstrapping image"
sudo singularity bootstrap ubuntu_tensorflow_GPU.img tf-gpu.def
echo "Running tensorflow install script"
sudo singularity exec -B `pwd`:/mnt -w ubuntu_tensorflow_GPU.img sh /mnt/tensorflow.sh

############# tf-gpu.def
# Copyright (c) 2015-2016, Gregory M. Kurtzer. All rights reserved.
#
# "Singularity" Copyright (c) 2016, The Regents of the University of California,
# through Lawrence Berkeley National Laboratory (subject to receipt of any
# required approvals from the U.S. Dept. of Energy). All rights reserved.

BootStrap: docker
From: tensorflow/tensorflow:latest-gpu
IncludeCmd: yes

%runscript
exec /usr/bin/python "$@"

%post
apt-get update && apt-get -y upgrade
apt-get install git -y

############# tensorflow.sh
driver_version=367.57
cuda_version=8.0.44_linux
cudnn_version=8.0-linux-x64-v5.1
sh /mnt/NVIDIA-Linux-x86_64-$driver_version.run -x
mv NVIDIA-Linux-x86_64-$driver_version /usr/local/
sh /mnt/links.sh $driver_version

sh /mnt/cuda_$cuda_version.run --toolkit --silent
tar xvf /mnt/cudnn-$cudnn_version.tgz -C /usr/local

driver_path=/usr/local/NVIDIA-Linux-x86_64-$driver_version
sudo echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf
sudo echo "$driver_path" >> /etc/ld.so.conf
# ldconfig doesn't list lubcuda*/libcudnn* without explicit commands
sudo ldconfig /usr/local/cuda/lib64
sudo ldconfig $driver_path

echo " " >> /environment
echo "LD_LIBRARY_PATH=/usr/local/cuda/lib64:$driver_path:$LD_LIBRARY_PATH" >> /environment
echo "PATH=$driver_path:\$PATH" >> /environment
echo "export CUDA_HOME=/usr/local/cuda" >> /environment
echo "export PATH LD_LIBRARY_PATH" >> /environment

############# links.sh
#!/usr/bin/bash

dir=/usr/local/NVIDIA-Linux-x86_64-$1

cd $dir

ln -s libcuda.so.$1 libcuda.so
ln -s libEGL.so.$1 libEGL.so
ln -s libGLESv1_CM.so.$1 libGLESv1_CM.so
ln -s libGLESv2.so.$1 libGLESv2.so
ln -s libGL.so.$1 libGL.so
ln -s libglx.so.$1 libglx.so
ln -s libnvcuvid.so.$1 libnvcuvid.so
ln -s libnvidia-cfg.so.$1 libnvidia-cfg.so
ln -s libnvidia-compiler.so.$1 libnvidia-compiler.so
ln -s libnvidia-eglcore.so.$1 libnvidia-eglcore.so
ln -s libnvidia-encode.so.$1 libnvidia-encode.so
ln -s libnvidia-fbc.so.$1 libnvidia-fbc.so
ln -s libnvidia-glcore.so.$1 libnvidia-glcore.so
ln -s libnvidia-glsi.so.$1 libnvidia-glsi.so
ln -s libnvidia-gtk2.so.$1 libnvidia-gtk2.so
ln -s libnvidia-gtk3.so.$1 libnvidia-gtk3.so
ln -s libnvidia-ifr.so.$1 libnvidia-ifr.so
ln -s libnvidia-ml.so.$1 libnvidia-ml.so
ln -s libnvidia-ml.so.$1 libnvidia-ml.so.1
ln -s libnvidia-opencl.so.$1 libnvidia-opencl.so
ln -s libnvidia-tls.so.$1 libnvidia-tls.so
ln -s libnvidia-wfb.so.$1 libnvidia-wfb.so
ln -s libvdpau_nvidia.so.$1 libvdpau_nvidia.so
ln -s libvdpau.so.$1 libvdpau.so
ln -s libvdpau_trace.so.$1 libvdpau_trace.so
ln -s libcuda.so.$1 libcuda.so.1

vanessa s

unread,

Dec 25, 2016, 8:36:41 AM12/25/16

to singu...@lbl.gov

Can you confirm that you are using the python inside the container? You specify it in your runscript, but then in the example call ipython:

/usr/local/lib/python2.7/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/rbharath' is not a writable location, using a temp directory.

I've run into the bug of using my (local machine) python from inside the image, instead of the image's, so confirming that you are using /usr/bin/python inside the image would be a good place to start.

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.

--

Vanessa Villamia Sochat
Stanford University '16

(603) 321-0676

Bharath Ramsundar

unread,

Dec 25, 2016, 3:21:01 PM12/25/16

to singularity

Good sanity suggestion. I tried running with the python from the runscript, but I'm still seeing the same error

rbharath@tensorbr0:~/Tensorflow-singularity-container-with-GPU-support$ sudo singularity run ubuntu_tensorflow_GPU.img
[sudo] password for rbharath:

Python 2.7.6 (default, Oct 26 2016, 20:30:19)

[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf

>>> tf.Session(config=tf.ConfigProto(log_device_placement=True))
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:

<tensorflow.python.client.session.Session object at 0x7f07bf8c1e90>

This isn't my local machine python since the date (Oct 26) doesn't match the date (Nov 19th) on my system /usr/bin/python

rbharath@tensorbr0:~/Tensorflow-singularity-container-with-GPU-support$ /usr/bin/python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Eliot Eshelman

unread,

Jan 2, 2017, 1:38:26 PM1/2/17

to singu...@lbl.gov

I was hitting the same problem. This seems very odd, but it appears the Docker image being returned does not include a version of TensorFlow built for GPUs.

I tried the version being used on Sherlock (tensorflow/tensorflow:0.11.0rc2-gpu) as well as a new version that is listed on DockerHub (tensorflow/tensorflow:latest-gpu). Neither seems to include GPU support.

If you take the Singularity container from above and run the following commands, you should end up with a version that works:

pip uninstall tensorflow protobuf pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

To test, run the Singularity and enter this into the Python interpreter:

import tensorflow as tfsess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

The way to tell whether you've got a GPU-enabled TensorFlow is to see if any CUDA messages are printed when the tensorflow Python module is imported. A non-GPU version will look like this:

>>> import tensorflow as tf>>>

A GPU-enabled version will look something like this:

>>> import tensorflow as tfI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally>>>

P.S. I was also hitting the issue of the nvidia-uvm Linux kernel module needing to be loaded before the GPUs could be accessed from within the container. The nvidia-smi utility does not need uvm loaded, so it's not the best test that CUDA is operational. Better to use some of the samples included with CUDA (they'll return CUDA errors). Once the host had loaded nvidia-uvm, all ran fine.

Raphael Townshend

unread,

Jan 23, 2017, 8:51:56 PM1/23/17

to singularity

Eliot, thank you for figuring this out. I updated https://github.com/drorlab/tf-singularity to use pip to install tensorflow-gpu instead of the docker version, and it works out of the box again.

One more thing to note, the tensorflow on docker does not seem to respect version either. I was trying to get 0.11.0rc2-gpu again and it kept giving me 0.12.1 ! Not sure if this is an issue on Docker's end, or with singularity Docker bootstrap.

vanessa s

unread,

Jan 23, 2017, 9:08:03 PM1/23/17

to singu...@lbl.gov

This is great! Do you want to try building the image via Singularity hub? It's a weird one and would be a good test :) Let me know if you have questions - basically the build file should be a file called "Singularity" (akin to Dockerfile) in the base of the repo that you connect to the hub, and each time you push the image will build. You can maintain different tags by creating new branches!