Tensorflow Docker in Singularity Doesn't See GPU

1,182 views
Skip to first unread message

Bharath Ramsundar

unread,
Dec 25, 2016, 2:39:42 AM12/25/16
to singularity
Hi,

I'm attempting to run Tensorflow within singularity on an Ubuntu 16.04 machine. I'm using the Tensorflow docker to bootstrap (I've been using https://github.com/drorlab/tf-singularity and https://github.com/jdongca2003/Tensorflow-singularity-container-with-GPU-support as my guides). I'm able to successfully build the singularity image and run tensorflow within it, but only with CPU. I've checked that I'm using the same NVIDIA driver version and CUDA versions in singularity as on the machine, so I don't think that's the issue. I'm able to use nvidia-smi from within the image to view the GPUs, but just not able to access the GPU from tensorflow. I've listed my output below, along with the code I used to generate the image. Any advice would be much appreciated :-)

rbharath:~/Tensorflow-singularity-container-with-GPU-support$ sudo singularity shell -w ubuntu_tensorflow_GPU.img 
Singularity: Invoking an interactive shell within container...

Singularity.ubuntu_tensorflow_GPU.img> # nvidia-smi
Sun Dec 25 07:35:43 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
| 27%   37C    P8    12W / 180W |     62MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
| 27%   34C    P8    11W / 180W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      3509    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+
Singularity.ubuntu_tensorflow_GPU.img> # ldconfig -p | grep libcuda
        libcudart.so.8.0 (libc6,x86-64) => /usr/local/cuda/lib64/libcudart.so.8.0
        libcudart.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudart.so
        libcuda.so.1 (libc6,x86-64) => /usr/local/NVIDIA-Linux-x86_64-367.57/libcuda.so.1
        libcuda.so (libc6,x86-64) => /usr/local/NVIDIA-Linux-x86_64-367.57/libcuda.so
Singularity.ubuntu_tensorflow_GPU.img> # ipython
/usr/local/lib/python2.7/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/rbharath' is not a writable location, using a temp directory.
  " using a temp directory.".format(parent))
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import tensorflow as tf

In [2]: tf.Session(config=tf.ConfigProto(log_device_placement=True))
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:

Out[2]: <tensorflow.python.client.session.Session at 0x7f97c853b210>

For completeness, I've listed below the code I use to build the image. The entrypoint is running 

sh build.sh

############# build.sh
echo "Removing old GPU image"
sudo rm -f ubuntu_tensorflow_GPU.img
echo "Creating GPU image"
sudo singularity create -s 5000 ubuntu_tensorflow_GPU.img
echo "Bootstrapping image"
sudo singularity bootstrap ubuntu_tensorflow_GPU.img tf-gpu.def
echo "Running tensorflow install script"
sudo singularity exec -B `pwd`:/mnt -w ubuntu_tensorflow_GPU.img sh /mnt/tensorflow.sh

############# tf-gpu.def
# Copyright (c) 2015-2016, Gregory M. Kurtzer. All rights reserved.
# "Singularity" Copyright (c) 2016, The Regents of the University of California,
# through Lawrence Berkeley National Laboratory (subject to receipt of any
# required approvals from the U.S. Dept. of Energy).  All rights reserved.

BootStrap: docker
From: tensorflow/tensorflow:latest-gpu 
IncludeCmd: yes

%runscript
    exec /usr/bin/python "$@"

%post
    apt-get update && apt-get -y upgrade
    apt-get install git -y

############# tensorflow.sh
driver_version=367.57
cuda_version=8.0.44_linux
cudnn_version=8.0-linux-x64-v5.1
sh /mnt/NVIDIA-Linux-x86_64-$driver_version.run -x
mv NVIDIA-Linux-x86_64-$driver_version /usr/local/
sh /mnt/links.sh $driver_version

sh /mnt/cuda_$cuda_version.run --toolkit --silent
tar xvf /mnt/cudnn-$cudnn_version.tgz -C /usr/local

driver_path=/usr/local/NVIDIA-Linux-x86_64-$driver_version
sudo echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf
sudo echo "$driver_path" >> /etc/ld.so.conf
# ldconfig doesn't list lubcuda*/libcudnn* without explicit commands
sudo ldconfig /usr/local/cuda/lib64
sudo ldconfig $driver_path

echo " " >> /environment
echo "LD_LIBRARY_PATH=/usr/local/cuda/lib64:$driver_path:$LD_LIBRARY_PATH" >> /environment
echo "PATH=$driver_path:\$PATH" >> /environment
echo "export CUDA_HOME=/usr/local/cuda" >> /environment
echo "export PATH LD_LIBRARY_PATH" >> /environment

############# links.sh
#!/usr/bin/bash

dir=/usr/local/NVIDIA-Linux-x86_64-$1

cd $dir

ln -s libcuda.so.$1 libcuda.so
ln -s libEGL.so.$1 libEGL.so
ln -s libGLESv1_CM.so.$1 libGLESv1_CM.so 
ln -s libGLESv2.so.$1 libGLESv2.so
ln -s libGL.so.$1 libGL.so
ln -s libglx.so.$1 libglx.so
ln -s libnvcuvid.so.$1 libnvcuvid.so
ln -s libnvidia-cfg.so.$1 libnvidia-cfg.so
ln -s libnvidia-compiler.so.$1 libnvidia-compiler.so
ln -s libnvidia-eglcore.so.$1 libnvidia-eglcore.so
ln -s libnvidia-encode.so.$1 libnvidia-encode.so
ln -s libnvidia-fbc.so.$1 libnvidia-fbc.so
ln -s libnvidia-glcore.so.$1 libnvidia-glcore.so
ln -s libnvidia-glsi.so.$1 libnvidia-glsi.so
ln -s libnvidia-gtk2.so.$1 libnvidia-gtk2.so
ln -s libnvidia-gtk3.so.$1 libnvidia-gtk3.so
ln -s libnvidia-ifr.so.$1 libnvidia-ifr.so
ln -s libnvidia-ml.so.$1 libnvidia-ml.so
ln -s libnvidia-ml.so.$1 libnvidia-ml.so.1
ln -s libnvidia-opencl.so.$1 libnvidia-opencl.so
ln -s libnvidia-tls.so.$1 libnvidia-tls.so
ln -s libnvidia-wfb.so.$1 libnvidia-wfb.so
ln -s libvdpau_nvidia.so.$1 libvdpau_nvidia.so
ln -s libvdpau.so.$1 libvdpau.so
ln -s libvdpau_trace.so.$1 libvdpau_trace.so
ln -s libcuda.so.$1 libcuda.so.1

vanessa s

unread,
Dec 25, 2016, 8:36:41 AM12/25/16
to singu...@lbl.gov
Can you confirm that you are using the python inside the container? You specify it in your runscript, but then in the example call ipython:

/usr/local/lib/python2.7/dist-packages/IPython/paths.py:69: UserWarning: IPython parent '/home/rbharath' is not a writable location, using a temp directory.

I've run into the bug of using my (local machine) python from inside the image, instead of the image's, so confirming that you are using /usr/bin/python inside the image would be a good place to start.

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity+unsubscribe@lbl.gov.



--
Vanessa Villamia Sochat
Stanford University '16

Bharath Ramsundar

unread,
Dec 25, 2016, 3:21:01 PM12/25/16
to singularity
Good sanity suggestion. I tried running with the python from the runscript, but I'm still seeing the same error

rbharath@tensorbr0:~/Tensorflow-singularity-container-with-GPU-support$ sudo singularity run ubuntu_tensorflow_GPU.img 
[sudo] password for rbharath: 
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.Session(config=tf.ConfigProto(log_device_placement=True))
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:

<tensorflow.python.client.session.Session object at 0x7f07bf8c1e90>

This isn't my local machine python since the date (Oct 26) doesn't match the date (Nov 19th) on my system /usr/bin/python

rbharath@tensorbr0:~/Tensorflow-singularity-container-with-GPU-support$ /usr/bin/python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Eliot Eshelman

unread,
Jan 2, 2017, 1:38:26 PM1/2/17
to singu...@lbl.gov

I was hitting the same problem. This seems very odd, but it appears the Docker image being returned does not include a version of TensorFlow built for GPUs.

I tried the version being used on Sherlock (tensorflow/tensorflow:0.11.0rc2-gpu) as well as a new version that is listed on DockerHub (tensorflow/tensorflow:latest-gpu). Neither seems to include GPU support.


If you take the Singularity container from above and run the following commands, you should end up with a version that works:

pip uninstall tensorflow protobuf
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl

To test, run the Singularity and enter this into the Python interpreter:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))


The way to tell whether you've got a GPU-enabled TensorFlow is to see if any CUDA messages are printed when the tensorflow Python module is imported. A non-GPU version will look like this:

>>> import tensorflow as tf
>>>

A GPU-enabled version will look something like this:

>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
>>>


P.S. I was also hitting the issue of the nvidia-uvm Linux kernel module needing to be loaded before the GPUs could be accessed from within the container. The nvidia-smi utility does not need uvm loaded, so it's not the best test that CUDA is operational. Better to use some of the samples included with CUDA (they'll return CUDA errors). Once the host had loaded nvidia-uvm, all ran fine.

Raphael Townshend

unread,
Jan 23, 2017, 8:51:56 PM1/23/17
to singularity
Eliot, thank you for figuring this out.  I updated https://github.com/drorlab/tf-singularity to use pip to install tensorflow-gpu instead of the docker version, and it works out of the box again.

One more thing to note, the tensorflow on docker does not seem to respect version either.  I was trying to get 0.11.0rc2-gpu again and it kept giving me 0.12.1 !  Not sure if this is an issue on Docker's end, or with singularity Docker bootstrap.

vanessa s

unread,
Jan 23, 2017, 9:08:03 PM1/23/17
to singu...@lbl.gov
This is great! Do you want to try building the image via Singularity hub? It's a weird one and would be a good test :) Let me know if you have questions - basically the build file should be a file called "Singularity" (akin to Dockerfile) in the base of the repo that you connect to the hub, and each time you push the image will build. You can maintain different tags by creating new branches!

Reply all
Reply to author
Forward
0 new messages