TensorFlow is unusable on Linux with Pascal GPUs

539 views
Skip to first unread message

Florin Andrei

unread,
Sep 6, 2016, 4:53:04 AM9/6/16
to Discuss
I've been trying for quite some time to make TF work on Linux with a fairly recent Pascal GPU, but nothing seems to work.

I've started with TensorFlow 0.10.0rc0 installed from the binary via pip, on Ubuntu 14.04, with CUDA 7.5 and cuDNN 4. Installation was easy enough, and TF seemed to work.

But then I've noticed it was "working" only for things that are pretty simple. As soon as I started testing more complex image processing like this...


...I started running into strange issues like this:


The code works fine on the CPU, it's just extremely slow (basically unusable). On the GPU all you get is black frames. Suspecting compatibility issues between the old CUDA and cuDNN libraries, and the new Pascal GPU chip, I asked on the Nvidia devtalk forum, and I was told CUDA 7.5 has issues with Pascal, so I should switch to CUDA 8, which is pretty much what I thought the issue was:


That means I need to compile TensorFlow from source. Sounds easy enough - except in practice this turned into a neverending maze of failure.

I've installed Ubuntu 16.04, CUDA 8 (plus the compiler-related patch), cuDNN 5.1, nvidia-driver-370. Cloned the TF repo and tried to compile from source.

First bug I hit, it's not even clear what the problem is. Perhaps TF is at fault, perhaps Bazel. At least there is a workaround - you have to delete the repo and try again from scratch.


Okay, but once you're past that stage you hit another bug. Apparently TF can't detect the version of the CUDA library. This one also has a workaround, if you just specify the CUDA version, it should start compiling:


But wait. If you think now it would compile successfully, you're wrong. After bypassing all these hurdles, compilation fails with internal compiler error (that's something I haven't seen in a long time):


And this is where the road ends for Pascal users on Ubuntu. There are no workarounds here, no solution, it's not TF's fault, case is closed.

I'm not sure what's next. For what I want to do, running TF on the CPU is not an option, I'd have to wait for days or weeks for test runs to complete. Meanwhile I have a Pascal chip that I can't use with TF. At least it works pretty well for games if I boot Windows. :)

Is Linux support not a priority? Is TF focused on using only narrow combinations of hardware and software versions? What's the direction that the project is going?

Any help is greatly appreciated.

Thanks.

Thomas Quintana

unread,
Sep 6, 2016, 8:22:38 AM9/6/16
to Florin Andrei, Discuss
https://github.com/ftlml/user-guides/wiki/Installing-TensorFlow-w-GPU-Support-on-Ubuntu-16.04-for-Pascal-architecture

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a8f7f8f0-9d9a-4db6-b952-b4259ddc9f8b%40tensorflow.org.

Florin Andrei

unread,
Sep 6, 2016, 2:15:31 PM9/6/16
to Discuss
Thank you for the link. That process looks very familiar, except for one thing: it uses the TF binary release with CUDA 8.0. The TF Get Started document, when choosing the CUDA version, specifically states: "Install [CUDA] version 7.5 if using our binary releases" - so therefore it's in contradiction with what the wiki does.

I can certainly give the wiki a try. I am just wary of going through the process of rebuilding the whole OS from scratch (it's currently running Ubuntu 14.04 and the TF binary - I do the CUDA 8.0 TF compilation tests in a virtual machine), only to discover there are slight incompatibilities that prevent TF from doing any serious work on Pascal with Linux. I don't doubt that it passes some simple test runs.

What is the reason for the clear statement on the Get Started document to use CUDA 7.5 with the binary TF? Is that statement wrong? If it is, it should be removed, as it sent me (and presumably others) on a quest that only ended up wasting a lot of time. If it's not wrong, then has anyone tested TF installed like that on Pascal with some big, complex model, and discovered any issues?

Vijay Vasudevan

unread,
Sep 6, 2016, 2:29:42 PM9/6/16
to Florin Andrei, Discuss
On Tue, Sep 6, 2016 at 11:15 AM, Florin Andrei <florin...@gmail.com> wrote:
Thank you for the link. That process looks very familiar, except for one thing: it uses the TF binary release with CUDA 8.0. The TF Get Started document, when choosing the CUDA version, specifically states: "Install [CUDA] version 7.5 if using our binary releases" - so therefore it's in contradiction with what the wiki does.

I can certainly give the wiki a try. I am just wary of going through the process of rebuilding the whole OS from scratch (it's currently running Ubuntu 14.04 and the TF binary - I do the CUDA 8.0 TF compilation tests in a virtual machine), only to discover there are slight incompatibilities that prevent TF from doing any serious work on Pascal with Linux. I don't doubt that it passes some simple test runs.

What is the reason for the clear statement on the Get Started document to use CUDA 7.5 with the binary TF?

Because our binaries are built with cuda 7.5, which is the latest non-release candidate release of the cuda runtime.  Cuda 8.0 is still in release candidate mode.  Unfortunately, Pascals don't work with cuda 7.5 reliably -- it would have been nice for NVidia to backport fixes to 7.5 if 8.0 wasn't ready, but that's not something we can do anything about.  When 8.0 is out of RC, we will upgrade our binary releases.  We may provide a beta binary for 8.0RC, just so more people don't have to deal with building from source if they don't have to, because it is painful right now.

We are trying to make building and installing easier, but it's clear the ecosystem is not yet mature, so you'll have to be patient as the community solidifies things.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Florin Andrei

unread,
Sep 6, 2016, 2:45:10 PM9/6/16
to Discuss, v...@google.com
Providing a beta binary for 8.0RC would be AWESOME.

Right now, Pascal users are facing a catch-22 type of situation: can't use CUDA 7.5 because of CUDA/Pascal incompatibilities, but can't use CUDA 8 either, because TF binaries are built for CUDA 7.5 (so who know if and how well it would work) and compilation from source fails in numerous, convoluted ways. It's frustrating because Pascal chips just have so much potential for being used with a framework like TF. When it does work (I've seen it with simple models) the whole system is very fast.

If someone more familiar with the TF code base could put together the magic sauce and stick a TF binary on the download site, made for CUDA 8, that would be fantastic. Thank you.

Martin Wicke

unread,
Sep 6, 2016, 4:05:00 PM9/6/16
to Florin Andrei, Discuss, v...@google.com
I would welcome community built binaries, that would be great. We won't be able to support more than one version of the dependency libraries at a time.

Thomas Quintana

unread,
Sep 6, 2016, 4:05:08 PM9/6/16
to Florin Andrei, Discuss, v...@google.com
I typically put these up for members of our local meetup group. Myself and a few members built TF from source using the instructions on the TF website after the wiki instructions and didn't have any issues using CUDA 8 and CuDNN 5.

Florin Andrei

unread,
Sep 7, 2016, 12:41:08 AM9/7/16
to Discuss
If I follow the wiki I get this:

$ python -m tensorflow.models.image.mnist.convolutional

/usr/bin/python: libcudart.so.7.5: cannot open shared object file: No such file or directory


Which makes sense, since the TF binary package was compiled for CUDA 7.5. I'm not even sure how the instructions on the wiki could ever work, unless you have CUDA 7.5 stashed away somewhere on the system - and then TF would actually use 7.5

Yaroslav Bulatov

unread,
Sep 7, 2016, 12:50:16 AM9/7/16
to Florin Andrei, Discuss
It makes sense to build for CUDA 8.0 even if not using Pascal -- on GTX 980 I saw 50% improvement in speed of matmul when building with 8.0 RC
There were some rumors that stable 8.0 is coming out mid September


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Thomas Quintana

unread,
Sep 7, 2016, 11:17:20 AM9/7/16
to Yaroslav Bulatov, Florin Andrei, Discuss
@Yaroslav +1 @Florin I believe you specified CUDA SDK version 7.5 instead of 8.0 when building from source. Please make sure you specify 8.0 during the configuration stage.

Tom

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.

dter...@gmail.com

unread,
Oct 9, 2016, 6:50:46 PM10/9/16
to Discuss, florin...@gmail.com, v...@google.com
For anybody using docker, I've built an image containing tensorflow 0.11 dev and Python3 that works (only) on Pascal GPUs (GTX 1070, GTX 1080, etc.). The image name (on Docker Hub) is dterdina/projects:tf11_pascal
Reply all
Reply to author
Forward
0 new messages