Support for cudnn, caffe, tensorflow

247 views
Skip to first unread message

torch ed

unread,
Sep 21, 2016, 3:32:31 PM9/21/16
to Spack
Hi,

Are there any plans for including cudnn, caffe and tensorflow into spack?

May I know the estimate for the next Spack release?

Thanks,
Jay

Gamblin, Todd

unread,
Sep 22, 2016, 7:26:44 AM9/22/16
to torch ed, Spack

Hi Jay,

 

I believe William Myers at Utah (mwilliammyers on github) was working on some of these.

 

I am planning at least one release in October, before Supercomputing’16.  Possibly two.

 

-Todd

--
You received this message because you are subscribed to the Google Groups "Spack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spack+un...@googlegroups.com.
To post to this group, send email to sp...@googlegroups.com.
Visit this group at https://groups.google.com/group/spack.
For more options, visit https://groups.google.com/d/optout.

Matt Belhorn

unread,
Sep 22, 2016, 8:40:12 AM9/22/16
to Spack
Hi Jay,

A colleague at the OLCF is working on adding tensorflow to a local repo. When we get it going and generalized for other systems we will submit a PR upstream if it's not added already.

Cheers,
Matt

Jay

unread,
Sep 22, 2016, 2:50:02 PM9/22/16
to Spack
Todd, Matt,
Thanks for getting back to me. I'll keep and eye on the PRs. Meanwhile, I managed to  get cudnn working. I'm still working on caffe and tensorflow. If I finish it soon, I'll definitely submit a PR. It should go to the develop branch, yes?

Thanks,
Jay

Gamblin, Todd

unread,
Sep 22, 2016, 2:51:22 PM9/22/16
to Jay, Spack

Yes, please!

 


Date: Thursday, September 22, 2016 at 11:50 AM
To: Spack <sp...@googlegroups.com>

--

Jay

unread,
Sep 22, 2016, 4:19:18 PM9/22/16
to Spack, jaya...@onutechnology.com
Todd, Is there a way to install wheel or pip files in spack? It looks like the only way to install Tensorflow is by using either pip or wheel.

Gamblin, Todd

unread,
Sep 22, 2016, 5:03:00 PM9/22/16
to Jay, Spack

Jay,

 

In Spack we typically don’t install via pip or wheel, mainly because if you do that, you lose control of the toolchain things get built with. Pip doesn’t model any dependencies below the python level, so you can’t, e.g., build with an intel/MKL or other stack underneath the Python level if you do that.

 

So porting to Spack would require you to look at the requirements pip uses, and map those to Spack.  Generally that is pretty straightforward – If you look at some of the existing Python packages in it (try typing `spack list py-`), then `spack edit` some of them you can see how we typically handle Python installs.

 

You probably want to look at this:

 

https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#installing-from-sources

 

It looks like the main things you would need are

 

Not python: bazel, coreutils, cudNN, cuda

Python: six numpy swig dev wheel ipython

 

Most of those are in Spack – the exceptions being cudNN, bazel, and dev.  I was looking at the page above and it looks like it’s possible to do a binary install of CUDA, at least in homebrew.  The Spack package currently requires users to put CUDA in a known location first, then install, but maybe we can do better now.

 

Can you try to get TensorFlow working without CUDA, and I’ll look at the CUDA install in the meantime?

Jay

unread,
Sep 22, 2016, 6:18:07 PM9/22/16
to Spack, jaya...@onutechnology.com
I'm using the "Installing from source section", but the end result of that section is a wheel file. I'm not sure how to install that. My guess is I can do something like this: `python('-m', 'wheel', 'install', '*.whl')` in the install function.

Yes, CUDA can be installed in a better way. NVIDIA provides raw URL for CUDA downloads: CUDA 7.5. Since I'm using CUDA 8 Release Candidate, there is no way this would work because they don't provide raw URLs for release candidates.

Gamblin, Todd

unread,
Sep 23, 2016, 4:58:56 PM9/23/16
to Jay, Spack

Ok – I’ll look into adding the URL for at least CUDA 7.5, and we could mark that as the preferred version.

 

AFAIK a wheel file is just a binary installation, right?  I think if the build produces a wheel, it has to have also generated what you need in a prefix.  For Spack you would want Tensorflow to exist in its own prefix…

 

I haven’t had a chance to look at how bazel does this, but it looks like they specify that it should build a wheel using //tensorflow/tools/pip_package:build_pip_package. Is there some other target you can use to install straight into a prefix?

Jay

unread,
Sep 23, 2016, 8:11:27 PM9/23/16
to Spack, jaya...@onutechnology.com
I kinda figured out how to do it. I'm basically treating the .whl fie like an .egg file. To test this, in a clean environment, I extracted the tensorflow whl file in site-packages and I could import tensorflow in python. Next up, to do this with spack.

I have hit a roadblock with caffe. It depends on atlas and it's taking forever to build it. I've already clocked more that about 2hrs on a high-end system and it's still not done. Is there a way to make the process faster by disabling some build option? At least for testing purposes?

Gamblin, Todd

unread,
Sep 23, 2016, 8:15:16 PM9/23/16
to Jay, Spack

Does it really need to depend on atlas, or would it be happy with just BLAS?

 

You could make it use any number of less-cumbersome-to-build alternatives like netlib-blas.  Or if it really needs Atlas, you can tweak your packages.yaml file to use a system atlas install as an external.

 

If you make it depends_on(‘blas’) instead of depends_on(‘atlas’), the user can pick which implementation they want.  The default concretization setting is to use openblas.  See the default packages.yaml:

 

                https://github.com/LLNL/spack/blob/develop/etc/spack/defaults/packages.yaml

 

Users can override this either in etc/spack/packages.yaml or in ~/.spack/packages.yaml.

Jay

unread,
Sep 23, 2016, 8:30:57 PM9/23/16
to Spack, jaya...@onutechnology.com
Atlas is default for Caffe. Optionally, it also works with OpenBLAS and MKL. Building with MKL is giving me some library issues with  MPI. Not sure if MKL or HDF5 is causing this problem. I'll try disabling MPI in HDF5 to see if it helps. If that doesn't work, I'll try openblas.

Elizabeth F

unread,
Sep 24, 2016, 10:44:19 AM9/24/16
to Jay, Spack
On Fri, Sep 23, 2016 at 8:30 PM, Jay <jaya...@onutechnology.com> wrote:
Atlas is default for Caffe. Optionally, it also works with OpenBLAS and MKL. Building with MKL is giving me some library issues with  MPI. Not sure if MKL or HDF5 is causing this problem. I'll try disabling MPI in HDF5 to see if it helps. If that doesn't work, I'll try openblas.

This sounds like some issues I've had in the past, often having to do with static libraries, dynamic libraries and -fPIC flags.  When Spack builds, it creates a `spack-build.out` file.  Please rename it to `spack-build.txt` and attach to your email, and I might be able to help.

-- Elizabeth

Jay

unread,
Sep 24, 2016, 5:37:53 PM9/24/16
to Spack, jaya...@onutechnology.com
Thanks, Elizabeth. I have attached the file.
spack-build.txt

Jay

unread,
Sep 26, 2016, 6:22:30 PM9/26/16
to Spack, jaya...@onutechnology.com
So, caffe is working with spack now. There are some issues and some quickfixes that I need to take care of:
  1. Caffe needs py-numpy includes if you enable python option. Refer to this post. To fix this temporarily, I copied the stuff in setup_dependent_package() into install()
  2. I had to disable mpi support for hdf5 for the last error to dissapear.
  3. Couple of patches and calls to sed script that I have to refine.

I will install caffe in a clean environment to make sure I won't those libboost warning. Hopefully, the hdf5 issues will disappear with that.


- Jay

Jimmy Tang

unread,
Mar 24, 2017, 10:43:33 AM3/24/17
to Spack, jaya...@onutechnology.com
Hi All,

Is there much progress on this, I'm interested in tensorflow (probably with cudann/cuda support) as well. Is there a branch that I can take a look at to help progress this along?

Jimmy

Gamblin, Todd

unread,
Mar 24, 2017, 1:21:11 PM3/24/17
to Jimmy Tang, Spack, jaya...@onutechnology.com
Jimmy,

We started on a branch at HackIllinois to build tensorflow without bazel, leveraging TF’s contributed CMake build.  We got pretty far through the Tensorflow build but hit an error with Eigen includes — I haven’t had time to go deeper into that.

But here is the PR (and branch).  Do you want to iterate on this? I’d be happy to work with you.


-Todd


On Mar 24, 2017, at 7:43 AM, Jimmy Tang <jcf...@gmail.com> wrote:

Unsubscribe

It appears that you have subscribed to commercial messages from this sender. To stop receiving such messages from this sender, please unsubscribe

Visit this group at http://secure-web.cisco.com/1--mVs9zFmBtz5L3qIdzuMBORwfQqECl5HRAv9PgKdSLOG8SP3vJEc9JL_JNPLy4eOy6ueN26iKZBcScuz7hnaoS_r3DB08fduRobSRgatVdzzx-tjQyp0wUXuU-hVxESsdKZXcPog9ePV_QC497mQwlIuw4PgUEF5D7TBwTytLGyK9a_pj_fYEAqM4c-Vy1APTeZs8J08vA6KIEZ1PbC8rcjDG-ujTpfrpZifB-qgoBOZ6xEui7Q9nAao3xGsUz9mxZRSj717ceIMTOo_W8avFzZusNl4eRDDXf-pADWV1FTo126Epyhvtp1SEnStoiAYoLmXukVKFifp2B51NmwtPE95_TUjcMLo90xZSALNXLdKNuyYOaNmYT1nqCVIPNDd8p2x0tVO4sg9M_b--nO3LX8y8vAYf1jWsjWRiUBr5U/l37%3Ahttps%3A%2F%2Fgroups.google.com%2Fgroup%2Fspacke.
For more options, visit http://secure-web.cisco.com/1lx2K9PVdWj0C2jw-YB2CqpFkUArDfETk8lohuNdzR_08yPwjOeTpEutmUO5qiSPDQvznz234JFzMpPiVO4U0U8YjYTLCm344nPzIGJguRfbJCigRuBoHBoJYzBPG-yNq3ui0Go0sGxeKVjhqo-Nx6OsdGuqBRwjVWclpO6_wQfv4em3N4HYV5xOZmdjqVtYdFIA8xw61qR-sS56FlRbq2Uyoh1IcVOkEKxlifkxMPcnnFJUg7LiuUkCmH2uOoAVlUaiAlXTk-xihvkEIz1wJ-gJXilJWSfCcRU_xsEwhvRdWw-1oySg2d8ZyQwYBbP2g6Nu-HS_PUJ6-Qh0Wz1OSWWT-bZ46yqceL6ogtlpid-1jBMeW-CrkwsuWDUkg4A0-fGdUeThcgOBDIt65yolNs2rCatVQZExz0FuTWtUB_5k/l34%3Ahttps%3A%2F%2Fgroups.google.com%2Fd%2Foptoute.

Jimmy Tang

unread,
Mar 24, 2017, 1:39:17 PM3/24/17
to Spack, jcf...@gmail.com, jaya...@onutechnology.com
Hi Todd,


On Friday, 24 March 2017 17:21:11 UTC, Todd Gamblin wrote:
Jimmy,

We started on a branch at HackIllinois to build tensorflow without bazel, leveraging TF’s contributed CMake build.  We got pretty far through the Tensorflow build but hit an error with Eigen includes — I haven’t had time to go deeper into that.

But here is the PR (and branch).  Do you want to iterate on this? I’d be happy to work with you.



I'll have to take a look at this PR, I had started trying to build tensorflow with bazel and hit a few problems. I'm going to see if I can commit some time to getting tensorflow into spack.

Jimmy
 

Gamblin, Todd

unread,
Mar 24, 2017, 2:34:13 PM3/24/17
to Jimmy Tang, Spack, jaya...@onutechnology.com
Jimmy,

That would be awesome.  We’ve had problems at LLNL with bazel and the assumptions it makes about our machines, as well as with installing using bazel on arigapped networks.  A native Spack build seems better to me for that reason.

I added you to the Spack repo on github so you can just push to the PR branch.

Some details you may care about:

1. The TF CMake build uses CMake external project, but we install dependencies with Spack.  We had to write some code to fool CMake into thinking that an external project build was complete, so you should see that in the branch.  We build thing with spack and then symlink them into place in TF’s CMake build directory.  It would be super nice if Kitware actually provided an override feature for external project, as it would make things very easy for packagers.  I was unable to find a feature like that.

2. We had to patch a few things because the TF build assumes it can pull files *not only* from the install directory of cmake-built dependencies *but also* from their build directories.  I really don’t understand why they do that, but I guess we have to deal with it.  An example of this is that TF grabs a bunch of build headers from jpeg.  As a workaround, we patched Spack’s JPEG build to install those headers in a subdirectory of the jpeg lib’s prefix.

Once we got all the dependencies built and step 1 complete, most of the effort was in figuring out what things were needed in step 2.  I think we got most of that out of the way and we were just hitting an eigen compile issue — TF thought something needed to be a type but it wasn’t defined.  I am not sure if that was because of a missing header or because of the way we compiled Eigen.  It may be that the variants required in Eigen just need a little refining.

-Todd


Jimmy Tang

unread,
Mar 25, 2017, 1:27:11 PM3/25/17
to Spack, jcf...@gmail.com, jaya...@onutechnology.com
Hi Todd,

I spent a few hours poking at that PR and tried updating to the latest tensorflow, it looks like they have tensorboard as a part of the build now - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/cmake/CMakeLists.txt#L227 this causes the build to download a bunch of javascript libraries from the internet.

I'm scratching my head at how it's all assembled

Jimmy

Pramod Kumbhar

unread,
Apr 6, 2017, 2:08:54 PM4/6/17
to Spack, jcf...@gmail.com, jaya...@onutechnology.com
If someone has managed to install TensorFlow and dependencies, I will be interested to know.

-Pramod
Reply all
Reply to author
Forward
0 new messages