TensorFlow, PyTorch, and manylinux1

Philipp Moritz

unread,

Dec 15, 2018, 11:43:43 PM12/15/18

to devel...@tensorflow.org, sou...@gmail.com, ray-dev, d...@arrow.apache.org

Dear all,

As some of you know, there is a standard in Python called manylinux (https://www.python.org/dev/peps/pep-0513/) to package binary executables and libraries into a “wheel” in a way that allows the code to be run on a wide variety of Linux distributions. This is very convenient for Python users, since such libraries can be easily installed via pip.

This standard is also important for a second reason: If many different wheels are used together in a single Python process, adhering to manylinux ensures that these libraries work together well and don’t trip on each other’s toes (this could easily happen if different versions of libstdc++ are used for example). Therefore even if support for only a single distribution like Ubuntu is desired, it is important to be manylinux compatible to make sure everybody’s wheels work together well.

TensorFlow and PyTorch unfortunately don’t produce manylinux compatible wheels. The challenge is due, at least in part, to the need to use nvidia-docker to build GPU binaries [10]. This causes various levels of pain for the rest of the Python community, see for example [1] [2] [3] [4] [5] [6] [7] [8].

The purpose of the e-mail is to get a discussion started on how we can make TensorFlow and PyTorch manylinux compliant. There is a new standard in the works [9] so hopefully we can discuss what would be necessary to make sure TensorFlow and PyTorch can adhere to this standard in the future.

It would make everybody’s lives just a little bit better! Any ideas are appreciated.

@soumith: Could you cc the relevant list? I couldn't find a pytorch dev mailing list.

Best,

Philipp.

[1] https://github.com/tensorflow/tensorflow/issues/5033

[2] https://github.com/tensorflow/tensorflow/issues/8802

[3] https://github.com/primitiv/primitiv-python/issues/28

[4] https://github.com/zarr-developers/numcodecs/issues/70

[5] https://github.com/apache/arrow/pull/3177

[6] https://github.com/tensorflow/tensorflow/issues/13615

[7] https://github.com/pytorch/pytorch/issues/8358

[8] https://github.com/ray-project/ray/issues/2159

[9] https://www.python.org/dev/peps/pep-0571/

[10] https://github.com/tensorflow/tensorflow/issues/8802#issuecomment-291935940

Robert Nishihara

unread,

Dec 16, 2018, 12:25:18 AM12/16/18

to Philipp Moritz, devel...@tensorflow.org, soumith, ray-dev, d...@arrow.apache.org, yi...@yifeifeng.com

--
You received this message because you are subscribed to the Google Groups "ray-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ray-dev+u...@googlegroups.com.
To post to this group, send email to ray...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

soumith

unread,

Dec 16, 2018, 1:35:22 AM12/16/18

to Robert Nishihara, pcmo...@gmail.com, devel...@tensorflow.org, ray...@googlegroups.com, d...@arrow.apache.org, yi...@yifeifeng.com

Hi Philipp,

Thanks a lot for getting a discussion started. I've sunk ~100+ hours over the last 2 years making PyTorch wheels play well with OpenCV, TensorFlow and other wheels, that I'm glad to see this discussion started.

On the PyTorch wheels, we have been shipping with the minimum glibc and libstdc++ versions we can possibly work with, while keeping two hard constraints:

1. CUDA support

2. C++11 support

1. CUDA support

manylinux1 is not an option, considering CUDA doesn't work out of CentOS5. I explored this option [1] to no success.

manylinux2010 is an option at the moment wrt CUDA, but it's unclear when NVIDIA will lift support for CentOS6 under us.

Additionally, CuDNN 7.0 (if I remember) was compiled against Ubuntu 12.04 (meaning the glibc version is newer than CentOS6), and binaries linked against CuDNN refused to run on CentOS6. I requested that this constraint be lifted, and the next dot release fixed it.

The reason PyTorch binaries are not manylinux2010 compatible at the moment is because of the next constraint: C++11.

2. C++11

We picked C++11 as the minimum supported dialect for PyTorch, primarily to serve the default compilers of older machines, i.e. Ubuntu 14.04 and CentOS7. The newer options were C++14 / C++17, but we decided to polyfill what we needed to support older distros better.

A fully fleshed out C++11 implementation landed in gcc in various stages, with gradual ABI changes [2]. Unfortunately, the libstdc++ that ships with centos6 (and hence manylinx2010) isn't sufficient to cover all of C++11. For example, the binaries we built with devtoolset3 (gcc 4.9.2) on CentOS6 didn't run with the default libstdc++ on CentOS6 either due to ABI changes or minimum GLIBCXX version for some of the symbols being unavailable.

We tried our best to support our binaries running on CentOS6 and above with various ranges of static linking hacks until 0.3.1 (January 2018), but at some point hacks over hacks was only getting more fragile. Hence we moved to a CentOS7-based image in April 2018 [3], and relied only on dynamic linking to the system-shipped libstdc++.

As Wes mentions [4], an option is to host a modern C++ standard library via PyPI would put manylinux2010 on the table. There are however subtle consequences with this -- if this package gets installed into a conda environment, it'll clobber anaconda-shipped libstdc++, possibly corrupting environments for thousands of anaconda users (this is actually similar to the issues with `mkl` shipped via PyPI and Conda clobbering each other).

References:

[1] https://github.com/NVIDIA/nvidia-docker/issues/348

[2] https://gcc.gnu.org/wiki/Cxx11AbiCompatibility

[3] https://github.com/pytorch/builder/commit/44d9bfa607a7616c66fe6492fadd8f05f3578b93

[4] https://github.com/apache/arrow/pull/3177#issuecomment-447515982

Wes McKinney

unread,

Dec 16, 2018, 2:54:03 PM12/16/18

to d...@arrow.apache.org, Philipp Moritz, devel...@tensorflow.org, sou...@gmail.com, ray...@googlegroups.com, yi...@yifeifeng.com

In response to the non-conforming ABI in the TF and PyTorch wheels, we
have attempted to hack around the issue with some elaborate
workarounds [1] [2] that have ultimately proved to not work
universally. The bottom line is that this is burdening other projects
in the Python ecosystem and causing confusing application crashes.

First, to state what should hopefully obvious to many of you, Python
wheels are not a robust way to deploy complex C++ projects, even
setting aside the compiler toolchain issue. If a project has
non-trivial third party dependencies, you either have to statically
link them or bundle shared libraries with the wheel (we do a bit of
both in Apache Arrow). Neither solution is foolproof in all cases.
There are other downsides to wheels when it comes to numerical
computing -- it is difficult to utilize things like the Intel MKL
which may be used by multiple projects. If two projects have the same
third party C++ dependency (e.g. let's use gRPC or libprotobuf as a
straw man example), it's hard to guarantee that versions or ABI will
not conflict with each other.

In packaging with conda, we pin all dependencies when building
projects that depend on them, then package and deploy the dependencies
as separate shared libraries instead of bundling. To resolve the need
for newer compilers or newer C++ standard library, libstdc++.so and
other system shared libraries are packaged and installed as
dependencies. In manylinux1, the RedHat devtoolset compiler toolchain
is used as it performs selective static linking of symbols to enable
C++11 libraries to be deployed on older Linuxes like RHEL5/6. A conda
environment functions as sort of portable miniature Linux
distribution.

Given the current state of things, as using the TensorFlow and PyTorch
wheels in the same process as other conforming manylinux1 wheels is
unsafe, it's hard to see how one can continue to recommend pip as a
preferred installation path until the ABI problems are resolved. For
example, "pip" is what is recommended for installing TensorFlow on
Linux [3]. It's unclear that non-compliant wheels should be allowed in
the package manager at all (I'm aware that this was deemed to not be
the responsibility of PyPI to verify policy compliance [4]).

A couple possible paths forward (there may be others):

* Collaborate with the Python packaging authority to evolve the
manylinux ABI to be able to produce compliant wheels that support the
build and deployment requirements of these projects
* Create a new ABI tag for CUDA/C++11-enabled Python wheels so that
projects can ship packages that can be guaranteed to work properly
with TF/PyTorch. This might require vendoring libstdc++ in some kind
of "toolchain" wheel that projects using this new ABI can depend on

Note that these toolchain and deployment issues are absent when
building and deploying with conda packages, since build- and run-time
dependencies can be pinned and shared across all the projects that
depend on them, ensuring ABI cross-compatibility. It's great to have
the convenience of "pip install $PROJECT", but I believe that these
projects have outgrown the intended use for pip and wheel
distributions.

Until the ABI incompatibilities are resolved, I would encourage more
prominent user documentation about the non-portability and potential
for crashes with these Linux wheels.

Thanks,
Wes

[1]: https://github.com/apache/arrow/commit/537e7f7fd503dd920c0b9f0cef8a2de86bc69e3b
[2]: https://github.com/apache/arrow/commit/e7aaf7bf3d3e326b5fe58d20f8fc45b5cec01cac
[3]: https://www.tensorflow.org/install/
[4]: https://www.python.org/dev/peps/pep-0513/#id50

On Sat, Dec 15, 2018 at 11:25 PM Robert Nishihara
<robertn...@gmail.com> wrote:
>
> On Sat, Dec 15, 2018 at 8:43 PM Philipp Moritz <pcmo...@gmail.com> wrote:
>
> > Dear all,
> >
> > As some of you know, there is a standard in Python called manylinux (
> > https://www.python.org/dev/peps/pep-0513/) to package binary executables
> > and libraries into a “wheel” in a way that allows the code to be run on a
> > wide variety of Linux distributions. This is very convenient for Python
> > users, since such libraries can be easily installed via pip.
> >
> > This standard is also important for a second reason: If many different
> > wheels are used together in a single Python process, adhering to manylinux
> > ensures that these libraries work together well and don’t trip on each
> > other’s toes (this could easily happen if different versions of libstdc++

> > are used for example). Therefore *even if support for only a single
> > distribution like Ubuntu is desired*, it is important to be manylinux

> > <https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> > .

Wes McKinney

unread,

Dec 16, 2018, 2:57:52 PM12/16/18

to d...@arrow.apache.org, Philipp Moritz, devel...@tensorflow.org, sou...@gmail.com, ray...@googlegroups.com, yi...@yifeifeng.com

Reposting since I wasn't subscribed to devel...@tensorflow.org. I
also didn't see Soumith's response since it didn't come through to
d...@arrow.apache.org

> On Sat, Dec 15, 2018 at 8:43 PM Philipp Moritz <pcmo...@gmail.com> wrote:
>
> > Dear all,
> >
> > As some of you know, there is a standard in Python called manylinux (
> > https://www.python.org/dev/peps/pep-0513/) to package binary executables
> > and libraries into a “wheel” in a way that allows the code to be run on a
> > wide variety of Linux distributions. This is very convenient for Python
> > users, since such libraries can be easily installed via pip.
> >
> > This standard is also important for a second reason: If many different
> > wheels are used together in a single Python process, adhering to manylinux
> > ensures that these libraries work together well and don’t trip on each
> > other’s toes (this could easily happen if different versions of libstdc++

> > are used for example). Therefore *even if support for only a single
> > distribution like Ubuntu is desired*, it is important to be manylinux

> > <https://groups.google.com/d/msgid/ray-dev/CAFs1FxUBAag6AThj34twiAB6KY3t5sJSJF3g70K3SvF-%2BzGGgw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> > .

soumith

unread,

Dec 17, 2018, 1:32:14 AM12/17/18

to wesm...@gmail.com, d...@arrow.apache.org, Philipp Moritz, devel...@tensorflow.org, ray...@googlegroups.com, yi...@yifeifeng.com

I'm reposting my original reply below the current reply (below a dotted line). It was filtered out because I wasn't subscribed to the relevant mailing lists.

tl;dr: manylinux2010 looks pretty promising, because CUDA supports CentOS6 (for now).

In the meanwhile, I dug into what pyarrow does, and it looks like it links with `static-libstdc++` along with a linker version script [1].

PyTorch did exactly that until Jan this year [2], except that our linker version script didn't cover the subtleties of statically linking stdc++ as well as Arrow did. Because we weren't covering all of the stdc++ static linking subtleties, we were facing huge issues that amplified wheel incompatibility (import X; import torch crashing under various X). Hence, we moved since then to linking with system-shipped libstdc++, doing no static stdc++ linking.

I'll revisit this in light of manylinux2010, and go down the path of static linkage of stdc++ again, though I'm wary of the subtleties around handling of weak symbols, std::string destruction across library boundaries [3] and std::string's ABI incompatibility issues.

I've opened a tracking issue here: https://github.com/pytorch/pytorch/issues/15294

I'm looking forward to hearing from the TensorFlow devs if manylinux2010 is sufficient for them, or what additional constraints they have.

As a personal thought, I find multiple libraries in the same process statically linking to stdc++ gross, but without a package manager like Anaconda that actually is willing to deal with the C++-side dependencies, there aren't many options on the table.

References:

[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/symbols.map

[2] https://github.com/pytorch/pytorch/blob/v0.3.1/tools/pytorch.version

[3] https://github.com/pytorch/pytorch/issues/5400#issuecomment-369428125

............................................................................................................................................................

Hi Philipp,

Thanks a lot for getting a discussion started. I've sunk ~100+ hours over the last 2 years making PyTorch wheels play well with OpenCV, TensorFlow and other wheels, that I'm glad to see this discussion started.

On the PyTorch wheels, we have been shipping with the minimum glibc and libstdc++ versions we can possibly work with, while keeping two hard constraints:

1. CUDA support

2. C++11 support

1. CUDA support

manylinux1 is not an option, considering CUDA doesn't work out of CentOS5. I explored this option [1] to no success.

manylinux2010 is an option at the moment wrt CUDA, but it's unclear when NVIDIA will lift support for CentOS6 under us.

Additionally, CuDNN 7.0 (if I remember) was compiled against Ubuntu 12.04 (meaning the glibc version is newer than CentOS6), and binaries linked against CuDNN refused to run on CentOS6. I requested that this constraint be lifted, and the next dot release fixed it.

The reason PyTorch binaries are not manylinux2010 compatible at the moment is because of the next constraint: C++11.

2. C++11

We picked C++11 as the minimum supported dialect for PyTorch, primarily to serve the default compilers of older machines, i.e. Ubuntu 14.04 and CentOS7. The newer options were C++14 / C++17, but we decided to polyfill what we needed to support older distros better.

A fully fleshed out C++11 implementation landed in gcc in various stages, with gradual ABI changes [2]. Unfortunately, the libstdc++ that ships with centos6 (and hence manylinx2010) isn't sufficient to cover all of C++11. For example, the binaries we built with devtoolset3 (gcc 4.9.2) on CentOS6 didn't run with the default libstdc++ on CentOS6 either due to ABI changes or minimum GLIBCXX version for some of the symbols being unavailable.

We tried our best to support our binaries running on CentOS6 and above with various ranges of static linking hacks until 0.3.1 (January 2018), but at some point hacks over hacks was only getting more fragile. Hence we moved to a CentOS7-based image in April 2018 [3], and relied only on dynamic linking to the system-shipped libstdc++.

As Wes mentions [4], an option is to host a modern C++ standard library via PyPI would put manylinux2010 on the table. There are however subtle consequences with this -- if this package gets installed into a conda environment, it'll clobber anaconda-shipped libstdc++, possibly corrupting environments for thousands of anaconda users (this is actually similar to the issues with `mkl` shipped via PyPI and Conda clobbering each other).

References:

[1] https://github.com/NVIDIA/nvidia-docker/issues/348

[2] https://gcc.gnu.org/wiki/Cxx11AbiCompatibility

[3] https://github.com/pytorch/builder/commit/44d9bfa607a7616c66fe6492fadd8f05f3578b93

[4] https://github.com/apache/arrow/pull/3177#issuecomment-447515982

..............................................................................................................................................................................................

Travis Oliphant

unread,

Dec 17, 2018, 9:54:49 AM12/17/18

to d...@arrow.apache.org, Wes McKinney, Philipp Moritz, devel...@tensorflow.org, ray...@googlegroups.com, yi...@yifeifeng.com

Can PyTorch provide and maintain a conda-forge recipe?

This would allow the large and growing conda forge ecosystem to easily install PyTorch in a community-supported way.

Are there problems with using conda or another general package manager?

I agree that the machine learning packages are trying to make a language specific package manager do more than it was intended and other open source solutions already exist.

Thanks,

Travis

Wes McKinney

unread,

Dec 17, 2018, 10:31:48 AM12/17/18

to sou...@gmail.com, d...@arrow.apache.org, Philipp Moritz, devel...@tensorflow.org, ray...@googlegroups.com, yi...@yifeifeng.com

hi Soumith,

On Mon, Dec 17, 2018 at 12:32 AM soumith <sou...@gmail.com> wrote:
>
> I'm reposting my original reply below the current reply (below a dotted line). It was filtered out because I wasn't subscribed to the relevant mailing lists.
>
> tl;dr: manylinux2010 looks pretty promising, because CUDA supports CentOS6 (for now).
>
> In the meanwhile, I dug into what pyarrow does, and it looks like it links with `static-libstdc++` along with a linker version script [1].

We aren't passing -static-libstdc++. The static linking of certain
symbols (so that C++11 features work on older systems) is handled
automatically by devtoolset-2; we are modifying the visibility of some
of these linked symbols, though

>
> PyTorch did exactly that until Jan this year [2], except that our linker version script didn't cover the subtleties of statically linking stdc++ as well as Arrow did. Because we weren't covering all of the stdc++ static linking subtleties, we were facing huge issues that amplified wheel incompatibility (import X; import torch crashing under various X). Hence, we moved since then to linking with system-shipped libstdc++, doing no static stdc++ linking.
>

Unless you were using the devtoolset-2 toolchain, you were doing
something different :) My understanding is that passing
-static-libstdc++ with stock gcc or clang is mainly only appropriate
when building dependency-free binary applications

> I'll revisit this in light of manylinux2010, and go down the path of static linkage of stdc++ again, though I'm wary of the subtleties around handling of weak symbols, std::string destruction across library boundaries [3] and std::string's ABI incompatibility issues.
>
> I've opened a tracking issue here: https://github.com/pytorch/pytorch/issues/15294
>
> I'm looking forward to hearing from the TensorFlow devs if manylinux2010 is sufficient for them, or what additional constraints they have.
>
> As a personal thought, I find multiple libraries in the same process statically linking to stdc++ gross, but without a package manager like Anaconda that actually is willing to deal with the C++-side dependencies, there aren't many options on the table.

IIUC the idea of the devtoolset-* toolchains is that all libraries
should use the same toolchain then there are no issues. Having
multiple projects passing -static-libstdc++ when linking would indeed
be problematic. The problem we are having is that if any library is
using devtoolset-2, all libraries need to in order to be compatible.

>
> References:
>
> [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/symbols.map
> [2] https://github.com/pytorch/pytorch/blob/v0.3.1/tools/pytorch.version
> [3] https://github.com/pytorch/pytorch/issues/5400#issuecomment-369428125
> ............................................................................................................................................................
> Hi Philipp,
>
> Thanks a lot for getting a discussion started. I've sunk ~100+ hours over the last 2 years making PyTorch wheels play well with OpenCV, TensorFlow and other wheels, that I'm glad to see this discussion started.
>
>
> On the PyTorch wheels, we have been shipping with the minimum glibc and libstdc++ versions we can possibly work with, while keeping two hard constraints:
>
> 1. CUDA support
> 2. C++11 support
>
>
> 1. CUDA support
>
> manylinux1 is not an option, considering CUDA doesn't work out of CentOS5. I explored this option [1] to no success.
>
> manylinux2010 is an option at the moment wrt CUDA, but it's unclear when NVIDIA will lift support for CentOS6 under us.
> Additionally, CuDNN 7.0 (if I remember) was compiled against Ubuntu 12.04 (meaning the glibc version is newer than CentOS6), and binaries linked against CuDNN refused to run on CentOS6. I requested that this constraint be lifted, and the next dot release fixed it.
>
> The reason PyTorch binaries are not manylinux2010 compatible at the moment is because of the next constraint: C++11.

Do we need to involve NVIDIA in this discussion? Having problematic
GPU-enabled libraries in PyPI isn't too good for them either.

>
> 2. C++11
>
> We picked C++11 as the minimum supported dialect for PyTorch, primarily to serve the default compilers of older machines, i.e. Ubuntu 14.04 and CentOS7. The newer options were C++14 / C++17, but we decided to polyfill what we needed to support older distros better.
>
> A fully fleshed out C++11 implementation landed in gcc in various stages, with gradual ABI changes [2]. Unfortunately, the libstdc++ that ships with centos6 (and hence manylinx2010) isn't sufficient to cover all of C++11. For example, the binaries we built with devtoolset3 (gcc 4.9.2) on CentOS6 didn't run with the default libstdc++ on CentOS6 either due to ABI changes or minimum GLIBCXX version for some of the symbols being unavailable.
>

Do you have a link to the paper trail about this? I had thought a
major raison d'etre of the devtoolset compilers is to support C++11 on
older Linuxes. For example, we are using C++11 in Arrow but we're
limiting ourselves at present to what's available in gcc 4.8.x; our
binaries work fine on CentOS5 and 6.

> We tried our best to support our binaries running on CentOS6 and above with various ranges of static linking hacks until 0.3.1 (January 2018), but at some point hacks over hacks was only getting more fragile. Hence we moved to a CentOS7-based image in April 2018 [3], and relied only on dynamic linking to the system-shipped libstdc++.
>
> As Wes mentions [4], an option is to host a modern C++ standard library via PyPI would put manylinux2010 on the table. There are however subtle consequences with this -- if this package gets installed into a conda environment, it'll clobber anaconda-shipped libstdc++, possibly corrupting environments for thousands of anaconda users (this is actually similar to the issues with `mkl` shipped via PyPI and Conda clobbering each other).
>

More evidence that "pip" as a packaging tool may have already outlived
its usefulness to this community.

Somehow we need to arrange that the same compiler toolchain (with
consistent minimum glibc, libstdc++ version) is used to build all of
the binaries we are discussing here. Short of that some system
configurations will continue to have problems.

- Wes

Michael Sarahan

unread,

Dec 17, 2018, 10:45:37 AM12/17/18

to d...@arrow.apache.org, sou...@gmail.com, pcmo...@gmail.com, devel...@tensorflow.org, ray...@googlegroups.com, yi...@yifeifeng.com

> Somehow we need to arrange that the same compiler toolchain (with consistent minimum glibc, libstdc++ version) is used to build all of the binaries we are discussing here. Short of that some system configurations will continue to have problems.

This was exactly the purpose of Anaconda's crosstool-ng-based compiler toolchains. We wrote up a bit at https://www.anaconda.com/blog/developer-blog/utilizing-the-new-compilers-in-anaconda-distribution-5/

It's not free lunch, as it requires shipping the libstdc++ as others have noted. The glibc bound has ultimately been determined by other software for us - many things require features in newer glibc, and we have found it infeasible to continue support back to glibc 2.5 (centos 5).

It's still painful on Mac because we can't distribute the old SDK's, and using the new SDKs that people have does not seem to have the backwards compatibility guarantees that Apple says they do.

On the bright side, Microsoft seems to have done very well with compatibility between VS 2015 and 2017. Fingers crossed that that trend continues.

Martin Wicke

unread,

Dec 17, 2018, 12:31:17 PM12/17/18

to soumith, Jean-Marc Ludwig, bu...@tensorflow.org, wesm...@gmail.com, d...@arrow.apache.org, pcmo...@gmail.com, TensorFlow Developers, ray...@googlegroups.com, yi...@yifeifeng.com, Edd Wilder-James

Thank you Philipp for getting this started. We've been trying to get in touch and have tried via Nick Coghlan and Nathaniel Smith, but we never got far.

I'm a little late to the party, but basically, what Soumith said. We have the exact same constraints (C++-11, CUDA/cuDNN). These would be extremely common for any computation-heavy packages, and properly solving this issue would be a huge boon for the Python community.

Actual compliance with manylinux1 is out since it cannot fulfill those constraints. I'll also add that there is no way to build compliant wheels without using software beyond end-of-life (even beyond security updates).

manylinux2010 is indeed promising, and I saw that Nick merged support for it recently, though I don't think there has been a pip release including the support yet (maybe that has now changed?).

However, manylinux2010 still has (possible fatal) problems:

- CUDA10's minimum versions are higher than manylinux2010's maximum versions: specifically, GCC 4.4.7 > 4.3.0.

- NVIDIA's license terms for CUDA/cuDNN are not standard and redistribution can be problematic, and may depend on agreements you may have with NVIDIA. The libraries are also large, and including them would make distribution via pypi problematic. It would be much preferable if there was an approved way to distribute Python packages depending on external CUDA/cuDNN. I don't think this should be a problem, it is similar in spirit to the exception made for libGL.

I've added JM Ludwig to this thread, I think as was mentioned by someone else, having NVIDIA in the conversation is critical.

The group on this thread is a good start, maybe we can get together and make a proposal that meets the need of the scientific computing community? I think that would probably involve updating the minimum requirements (possibly to CentOS 7, I heard there was talk of a manylinux2014), carving out NVIDIA libraries, and creating a smoother path for updating these requirements (maybe a manylinux-rolling, which automatically updates maximum versions based on age or support status without requiring new PEPs).

I'm very interested in solving this problem, I feel bad for abusing the manylinux1 tag.

Martin

You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CAGZdauXqQ9Gze6eAB0R3%3D2j6X2yWfh7QPbrGj1%3D5xuvQUninpQ%40mail.gmail.com.

soumith

unread,

Dec 17, 2018, 4:51:29 PM12/17/18

to d...@arrow.apache.org, Wes McKinney, Philipp Moritz, devel...@tensorflow.org, ray...@googlegroups.com, yi...@yifeifeng.com

Hey Travis,

PyTorch and anaconda are actually smooth. There are no issues with Anaconda, and we officially maintain conda packages (it's also our recommended and default package manager).

Conda-forge recipes are currently not possible because conda-forge hasn't finalized their CUDA packaging mechanisms.

This thread is mostly focusing on unscrewing the PyPI situation.

--

S

soumith

unread,

Dec 17, 2018, 4:56:00 PM12/17/18

to Martin Wicke, Jean-Marc Ludwig, bu...@tensorflow.org, Wes McKinney, d...@arrow.apache.org, Philipp Moritz, TensorFlow Developers, ray...@googlegroups.com, yi...@yifeifeng.com, Edd Wilder-James

> The group on this thread is a good start, maybe we can get together and make a proposal that meets the need of the scientific computing community? I think that would probably involve updating the minimum requirements (possibly to CentOS 7, I heard there was talk of a manylinux2014), carving out NVIDIA libraries, and creating a smoother path for updating these requirements (maybe a manylinux-rolling, which automatically updates maximum versions based on age or support status without requiring new PEPs).

Martin, this sounds great. I'm really looking forward to the day where pytorch package binary sizes aren't heavily bloated because we have to ship with all of the CUDA / CuDNN / NCCL bits.

Is there a github issue or a private google doc that we can collaborate on, to clear our thoughts and requirements into a proposal? We can propose a manylinux2014 (or realize that manylinux2010 is somehow sufficient), as well as push NVIDIA to address the distribution situation of the CUDA stack.

--

S

Martin Wicke

unread,

Dec 17, 2018, 6:49:50 PM12/17/18

to soumith, Jean-Marc Ludwig, bu...@tensorflow.org, Wes McKinney, d...@arrow.apache.org, Philipp Moritz, TensorFlow Developers, ray...@googlegroups.com, yi...@yifeifeng.com, Edd Wilder-James

I have created a fork of tensorflow/community and added a file:

https://github.com/martinwicke/community/blob/master/sigs/build/manylinux-proposal.md

It's presently empty.

I've invited Soumith, Wes, and Philipp to collaborate on the repo, let's work on this there? If anybody else wants to join, just let me know.

Robert Nishihara

unread,

Dec 18, 2018, 8:54:12 PM12/18/18

to Martin Wicke, soumith, Jean-Marc Ludwig, bu...@tensorflow.org, Wes McKinney, d...@arrow.apache.org, Philipp Moritz, TensorFlow Developers, ray-dev, yi...@yifeifeng.com, Edd Wilder-James

Thanks Soumith and Martin for the detailed thoughts.

Jean-Marc would you be able to chime in or perhaps cc the relevant people? It'd be really great to hear from someone at NVIDIA, since NVIDIA seems best positioned to make manlinux2010 work out and will probably need to be part of a plan for manylinux2014 or some sort of manylinux-rolling.

I didn't realize that manylinux1 doesn't fully support C++11. We've been using C++11 pretty extensively and compiling on manylinux1 without issues as far as I know, but maybe we just haven't hit the relevant missing symbols.

Martin, I agree that meeting up to hammer out a proposal (or perhaps doing a call if that's easier) would be helpful.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CADtzJKMzpDj2SfFRygaxKTgJD3eoKi7kKBUgZExN9cceMN2CyQ%40mail.gmail.com.

Jason M Furmanek

unread,

Dec 18, 2018, 11:07:48 PM12/18/18

to wi...@google.com, bu...@tensorflow.org, d...@arrow.apache.org, devel...@tensorflow.org, e...@google.com, JLu...@nvidia.com, pcmo...@gmail.com, ray...@googlegroups.com, sou...@gmail.com, wesm...@gmail.com, yi...@yifeifeng.com

Hi Martin,

If the goal here is to propose a new manylinux standard, I'd love to be involved as well. Currently the existing standards excludes alternative (non-Intel) CPU architectures and specify certain levels that predate the existence of ppc64le and arm64 as Linux architectures. I could lend some insight to make the proposal a little more acceptable to those arches.

Jason M. Furmanek

Power Systems and Open Power Innovation and Solutions
IBM Systems & Technology Group
Mobile: 1-512-638-9692
E-mail: furm...@us.ibm.com

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CADtzJKMzpDj2SfFRygaxKTgJD3eoKi7kKBUgZExN9cceMN2CyQ%40mail.gmail.com.

Fridolín Pokorný

unread,

Dec 20, 2018, 4:42:08 AM12/20/18

to Martin Wicke, soumith, Jean-Marc Ludwig, bu...@tensorflow.org, wesm...@gmail.com, d...@arrow.apache.org, pcmo...@gmail.com, TensorFlow Developers, ray...@googlegroups.com, yi...@yifeifeng.com, Edd Wilder-James, Christoph Goern, Subin Modeel

Hi Martin,

On Mon, Dec 17, 2018 at 6:31 PM 'Martin Wicke' via SIG Build <bu...@tensorflow.org> wrote:

Thank you Philipp for getting this started. We've been trying to get in touch and have tried via Nick Coghlan and Nathaniel Smith, but we never got far.

I'm a little late to the party, but basically, what Soumith said. We have the exact same constraints (C++-11, CUDA/cuDNN). These would be extremely common for any computation-heavy packages, and properly solving this issue would be a huge boon for the Python community.

Actual compliance with manylinux1 is out since it cannot fulfill those constraints. I'll also add that there is no way to build compliant wheels without using software beyond end-of-life (even beyond security updates).

manylinux2010 is indeed promising, and I saw that Nick merged support for it recently, though I don't think there has been a pip release including the support yet (maybe that has now changed?).

However, manylinux2010 still has (possible fatal) problems:

- CUDA10's minimum versions are higher than manylinux2010's maximum versions: specifically, GCC 4.4.7 > 4.3.0.

- NVIDIA's license terms for CUDA/cuDNN are not standard and redistribution can be problematic, and may depend on agreements you may have with NVIDIA. The libraries are also large, and including them would make distribution via pypi problematic. It would be much preferable if there was an approved way to distribute Python packages depending on external CUDA/cuDNN. I don't think this should be a problem, it is similar in spirit to the exception made for libGL.

we at Rad Hat have an automated system that can test and verify software stacks, even cross-ecosystem (python, native packages, ...) - it's called Dependency Monkey [1] (part of a bigger project called "Thoth" [2, 3]); we are primarily now targeting TensorFlow software stacks which we can install out of different Python package indexes and run through build/verification/scoring pipeline. We use Dependency Monkey to benchmark our own TensorFlow configuration specific builds available on our index [4].

If we there is a part where we can combine efforts, let us know.

Please, keep us looped also in the manylinux2010.

Thanks,

Fridolin

[1] https://github.com/thoth-station/adviser/blob/master/docs/dependency_monkey.md

[2] https://github.com/thoth-station/thoth

[3] https://github.com/thoth-station/core

[4] http://tensorflow.pypi.thoth-station.ninja/

You received this message because you are subscribed to the Google Groups "SIG Build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/build/.

Manuel Klimek

unread,

Jan 22, 2019, 4:48:18 AM1/22/19

to TensorFlow Developers, wi...@google.com, bu...@tensorflow.org, d...@arrow.apache.org, e...@google.com, JLu...@nvidia.com, pcmo...@gmail.com, ray...@googlegroups.com, sou...@gmail.com, wesm...@gmail.com, yi...@yifeifeng.com

Sorry if I'm missing something fundamental, but it seems like a new manylinux standard would come with the same problem of basically being static and growing outdated.

I'd be interested in helping to provide a toolchain wheel, as mentioned in the initial post, at least for libc++ (potentially libstdc++) - it seems like that could be updated on an ongoing basis, use standard dependency management and if necessary be bootstrapped with a statically linked compiler.

What would the requirements for such a toolchain wheel be for it to have a chance to be widely used? (note that I come from a C++ background and don't have a lot of experience with Python outside of modules using C++ under the hood :)

Similarly, what would the downsides of such a toolchain wheel be?

Wes McKinney

unread,

Jan 29, 2019, 11:03:40 AM1/29/19

to Manuel Klimek, TensorFlow Developers, wi...@google.com, bu...@tensorflow.org, d...@arrow.apache.org, e...@google.com, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Antoine Pitrou, Uwe Korn

hi Manuel,

Adding a couple more folks from Apache Arrow to the thread to make
sure they see this discussion.

On Tue, Jan 22, 2019 at 3:48 AM Manuel Klimek <kli...@google.com> wrote:
>
> Sorry if I'm missing something fundamental, but it seems like a new manylinux standard would come with the same problem of basically being static and growing outdated.
>
> I'd be interested in helping to provide a toolchain wheel, as mentioned in the initial post, at least for libc++ (potentially libstdc++) - it seems like that could be updated on an ongoing basis, use standard dependency management and if necessary be bootstrapped with a statically linked compiler.
>
> What would the requirements for such a toolchain wheel be for it to have a chance to be widely used? (note that I come from a C++ background and don't have a lot of experience with Python outside of modules using C++ under the hood :)

In principle I would think that the requirement would be that we
demonstrate that wheels built with the newer compiler toolchain and
libstdc++ dependency can coexist with manylinux1 / manylinux2010
packages. This is supposed to be the promise of devtoolset-produced
libraries anyhow. A potential problem might be projects that need to
pass std::* objects between shared libraries in their C++ API. For
example, the "turbodbc" package uses the "pyarrow" package's C++ API.
This would just mean that any wheel that needs to depend on a wheel in
the "TF/PyTorch-compatible toolchain" ecosystem would necessarily need
to use the alternative build toolchain instead of manylinux*

If I'm reading the room right, it seems that manylinux2010 is
effectively DOA for TensorFlow and PyTorch, is that right? If that's
the case then we shouldn't spend another year or more wringing our
hands in hopes that the PyPA solves the problem in that way that we
need. We've got to get busy shipping software and move on with our
lives

- Wes

Manuel Klimek

unread,

Jan 30, 2019, 8:30:49 AM1/30/19

to Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Antoine Pitrou, Uwe Korn

On Tue, Jan 29, 2019 at 5:03 PM Wes McKinney <wesm...@gmail.com> wrote:

hi Manuel,

Adding a couple more folks from Apache Arrow to the thread to make
sure they see this discussion.

On Tue, Jan 22, 2019 at 3:48 AM Manuel Klimek <kli...@google.com> wrote:
>
> Sorry if I'm missing something fundamental, but it seems like a new manylinux standard would come with the same problem of basically being static and growing outdated.
>
> I'd be interested in helping to provide a toolchain wheel, as mentioned in the initial post, at least for libc++ (potentially libstdc++) - it seems like that could be updated on an ongoing basis, use standard dependency management and if necessary be bootstrapped with a statically linked compiler.
>
> What would the requirements for such a toolchain wheel be for it to have a chance to be widely used? (note that I come from a C++ background and don't have a lot of experience with Python outside of modules using C++ under the hood :)

In principle I would think that the requirement would be that we
demonstrate that wheels built with the newer compiler toolchain and
libstdc++ dependency can coexist with manylinux1 / manylinux2010
packages. This is supposed to be the promise of devtoolset-produced
libraries anyhow. A potential problem might be projects that need to
pass std::* objects between shared libraries in their C++ API. For
example, the "turbodbc" package uses the "pyarrow" package's C++ API.
This would just mean that any wheel that needs to depend on a wheel in
the "TF/PyTorch-compatible toolchain" ecosystem would necessarily need
to use the alternative build toolchain instead of manylinux*

Fundamentally, the C++ dependency chain seems to be solvable with pip package deps down to the libstdc++/libc++ level.

I think we'd basically need to provide:

a) a toolchain pip package to depend on

b) a manylinux docker image with those libraries and a compiler toolchain targeting them installed so packagers have an easy way to build these packages

Once we have that in a way that folks are happy with it, it sounds like we'd be good to go?

There are a couple of obvious questions:

- how to handle updates of that toolchain package / toolchain?

- what would we want to target as a first step?

My proposal for something that we could work on would be:
clang & libc++ @ llvm-8

Would that be something people could work with, or would folks expect this to be too much of an update to their current workflows to be useful? :)

Antoine Pitrou

unread,

Jan 30, 2019, 9:09:58 AM1/30/19

to Manuel Klimek, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Le 30/01/2019 à 14:30, Manuel Klimek a écrit :
> >
> > What would the requirements for such a toolchain wheel be for it
> to have a chance to be widely used? (note that I come from a C++
> background and don't have a lot of experience with Python outside of
> modules using C++ under the hood :)
>
> In principle I would think that the requirement would be that we
> demonstrate that wheels built with the newer compiler toolchain and
> libstdc++ dependency can coexist with manylinux1 / manylinux2010
> packages. This is supposed to be the promise of devtoolset-produced
> libraries anyhow. A potential problem might be projects that need to
> pass std::* objects between shared libraries in their C++ API. For
> example, the "turbodbc" package uses the "pyarrow" package's C++ API.
> This would just mean that any wheel that needs to depend on a wheel in
> the "TF/PyTorch-compatible toolchain" ecosystem would necessarily need
> to use the alternative build toolchain instead of manylinux*
>
> Fundamentally, the C++ dependency chain seems to be solvable with pip
> package deps down to the libstdc++/libc++ level.
> I think we'd basically need to provide:
> a) a toolchain pip package to depend on
> b) a manylinux docker image with those libraries and a compiler
> toolchain targeting them installed so packagers have an easy way to
> build these packages

Am I reading you wrong, or are you actually proposing to package another
libstdc++ version as a Python wheel?

If so, are you going to claim that the given wheel is manylinux-compatible?

Regards

Antoine.

Manuel Klimek

unread,

Jan 30, 2019, 9:35:52 AM1/30/19

to Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

That would be the idea.

If so, are you going to claim that the given wheel is manylinux-compatible?

That is my question :) Why wouldn't it be? (I'd link it against manylinux libc and other C-only system libs)

Regards

Antoine.

Antoine Pitrou

unread,

Jan 30, 2019, 9:51:52 AM1/30/19

to Manuel Klimek, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Le 30/01/2019 à 15:35, Manuel Klimek a écrit :
>
> Am I reading you wrong, or are you actually proposing to package another
> libstdc++ version as a Python wheel?
>
>
> That would be the idea.
>
>
> If so, are you going to claim that the given wheel is
> manylinux-compatible?
>
>
> That is my question :) Why wouldn't it be? (I'd link it against
> manylinux libc and other C-only system libs)

The problem is when you are loading two modules that link against
different libstdc++ versions in the same process. Incidentally, it's
the problem which prompted this discussion.

Regards

Antoine.

Jason M Furmanek

unread,

Jan 30, 2019, 9:57:31 AM1/30/19

to kli...@google.com, ant...@python.org, bu...@tensorflow.org, d...@arrow.apache.org, devel...@tensorflow.org, e...@google.com, JLu...@nvidia.com, pcmo...@gmail.com, ray...@googlegroups.com, sou...@gmail.com, wesm...@gmail.com, wi...@google.com, xho...@gmail.com, yi...@yifeifeng.com

>>Fundamentally, the C++ dependency chain seems to be solvable with pip package deps down to the libstdc++/libc++ level.

>>I think we'd basically need to provide:

>>a) a toolchain pip package to depend on

Sounds like Anaconda :)

>>b) a manylinux docker image with those libraries and a compiler toolchain targeting them installed so packagers have an easy way to build these packages

Sounds like conda-forge :)

Not to speak for Jonathan and Anconda, but I suspect they designed things the way they did for much of the same reasons as we are discussing here. pip and manylinux standards alone are not/were not good enough.

>>Once we have that in a way that folks are happy with it, it sounds like we'd be good to go?

>>There are a couple of obvious questions:

>>- how to handle updates of that toolchain package / toolchain?

>>- what would we want to target as a first step?

I'd also add:

- libc is still a pain, so targeted versions there would need spec'd out

>>My proposal for something that we could work on would be:
>>clang & libc++ @ llvm-8

Not that I disagree, but why clang over gcc? Seems like gcc may have a bit better compatibility across the board, but that might just be momentum talking.

>>Would that be something people could work with, or would folks expect this to be too much of an update to their current workflows to be useful? :)

Something like this seems well overdue IMHO

-Jason F

You received this message because you are subscribed to the Google Groups "SIG Build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/build/.

Manuel Klimek

unread,

Jan 30, 2019, 10:09:50 AM1/30/19

to Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Sure, I'm aware :) I think as long as the requirement that all libraries that want to exchange runtime-ABI-compatible versions are compiled with the same toolchain, we can provide a way to mangle the symbols differently. In the end, to me, the important part is that we have a toolchain provider doing that work, as opposed to every subsystem trying to roll their own solution :)

Regards

Antoine.

Manuel Klimek

unread,

Jan 30, 2019, 10:14:09 AM1/30/19

to Jason M Furmanek, Antoine Pitrou, bu...@tensorflow.org, d...@arrow.apache.org, TensorFlow Developers, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, Wes McKinney, Martin Wicke, Uwe Korn, yi...@yifeifeng.com

On Wed, Jan 30, 2019 at 3:57 PM Jason M Furmanek <furm...@us.ibm.com> wrote:

>>Fundamentally, the C++ dependency chain seems to be solvable with pip package deps down to the libstdc++/libc++ level.

>>I think we'd basically need to provide:

>>a) a toolchain pip package to depend on

Sounds like Anaconda :)

>>b) a manylinux docker image with those libraries and a compiler toolchain targeting them installed so packagers have an easy way to build these packages

Sounds like conda-forge :)

Not to speak for Jonathan and Anconda, but I suspect they designed things the way they did for much of the same reasons as we are discussing here. pip and manylinux standards alone are not/were not good enough.

I'll need to look at anaconda, thx for the pointer!

>>Once we have that in a way that folks are happy with it, it sounds like we'd be good to go?

>>There are a couple of obvious questions:

>>- how to handle updates of that toolchain package / toolchain?

>>- what would we want to target as a first step?

I'd also add:

- libc is still a pain, so targeted versions there would need spec'd out

I'd be curious to learn about examples - I can think of performance, but are there other gotchas?

>>My proposal for something that we could work on would be:
>>clang & libc++ @ llvm-8

Not that I disagree, but why clang over gcc? Seems like gcc may have a bit better compatibility across the board, but that might just be momentum talking.

Mainly because it would be an incremental thing for my team to deliver this for clang, but a completely new project to do it for gcc, so from an impact / effort point of view only clang would make sense for me to deliver.

I know I know, this is your typical "hey y'all, over here, I have a hammer, anybody has a fitting nail?" situation :(

Antoine Pitrou

unread,

Jan 30, 2019, 10:21:30 AM1/30/19

to Manuel Klimek, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Le 30/01/2019 à 16:09, Manuel Klimek a écrit :
>
> On Wed, Jan 30, 2019 at 3:51 PM Antoine Pitrou <ant...@python.org
> <mailto:ant...@python.org>> wrote:
>
>
> Le 30/01/2019 à 15:35, Manuel Klimek a écrit :
> >
> > Am I reading you wrong, or are you actually proposing to
> package another
> > libstdc++ version as a Python wheel?
> >
> >
> > That would be the idea.
> >
> >
> > If so, are you going to claim that the given wheel is
> > manylinux-compatible?
> >
> >
> > That is my question :) Why wouldn't it be? (I'd link it against
> > manylinux libc and other C-only system libs)
>
> The problem is when you are loading two modules that link against
> different libstdc++ versions in the same process. Incidentally, it's
> the problem which prompted this discussion.
>
>
> Sure, I'm aware :) I think as long as the requirement that all libraries
> that want to exchange runtime-ABI-compatible versions are compiled with
> the same toolchain, we can provide a way to mangle the symbols
> differently.

Ah, I see... Indeed, mangling the symbols may work for this.

That said, if you're looking to create a de facto standard, why can't it
be proposed as a manylinux iteration?

Regards

Antoine.

Manuel Klimek

unread,

Jan 30, 2019, 10:34:32 AM1/30/19

to Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

I'd have thought because it doesn't change the system requirements, while manylinux seems to be all about system requirements.

The idea is that that toolchain would still work on any manylinux compatible machine.

Regards

Antoine.

Jason Zaman

unread,

Feb 4, 2019, 12:00:06 AM2/4/19

to Manuel Klimek, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Hey all,

We're having the TensorFlow SIG-Build meeting on 5th Feb 3pm PST
(https://www.timeanddate.com/worldclock/fixedtime.html?iso=20190205T15&p1=224).
Agenda is linked from:
https://groups.google.com/a/tensorflow.org/forum/#!topic/build/akyPcGoBIy4

I'd like to invite everyone from this thread to join the call if at
all possible. The agenda for this meeting will spend most of the time
focusing on the manylinux issue and hopefully we can get together to
flesh out a decent plan on how to tackle this.

Thanks,
Jason

> --
> You received this message because you are subscribed to the Google Groups "SIG Build" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.
> Visit this group at https://groups.google.com/a/tensorflow.org/group/build/.

Manuel Klimek

unread,

Feb 4, 2019, 6:29:13 AM2/4/19

to Jason Zaman, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

+Dmitri Gribenko

Dmitri has experience with EasyBuild, which seems to be used by the HPC community to solve the bootstrap problem and could be used to build a toolchain image & pip package.

Unfortunately we'll not be able to join the meeting as it's at midnight CEST - looking forward to the conclusions from the meeting!

Jason Zaman

unread,

Feb 4, 2019, 10:34:23 AM2/4/19

to Manuel Klimek, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

yeah that's expected. The timing is complicated with people spread all
over. We will post notes after the meeting on the SIG-Build mailing
list and I'd also be up for organizing a separate call with europe
folks if that would be of interest.

On Mon, 4 Feb 2019 at 19:29, 'Manuel Klimek' via SIG Build

Lenore Mullin

unread,

Feb 4, 2019, 10:47:02 AM2/4/19

to Jason Zaman, Manuel Klimek, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

Wonderful. It's all about uniting universal ideas, mathematics, and then when the rubber hits the road: software, firmware, add hardware in a

universal way. MoA is the mathematics that guides the design.

You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CAPuKSJbMA%3DJpvLj%2BFcUK%3DZXXJ5gdz%3D7-oPf8EzEkMNNDXq63ag%40mail.gmail.com.

--

"Great spirits have always encountered violent opposition from mediocre minds" - Albert Einstein

Uwe L. Korn

unread,

Feb 4, 2019, 11:32:11 AM2/4/19

to Jason Zaman, Manuel Klimek, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

Just as a heads-up: I would like to also join the meeting but am also located in Europe.

I have spent quite some time with the packaging of wheels for pyarrow and turbodbc thus I would like to also give input on this. For Apache Arrow, I see newer manylinux2014 standard as a possible way to go. I'm not so fond of rolloing lib(std)c++ packages inside of pip. It's sadly the case that the features of pip don't allow a good dependency resolution, also with taking CUDA into account, a dependency resolution that differs between source and binary builds of a package. For this case, exactly conda was developed because it was considered out-of-scope for the core Python packaging system. I'm not sure whether we actually can fit all the requirements of the packages that take part in this mail thread into pip without simply reimplementing conda inside of pip.

Uwe

Manuel Klimek

unread,

Feb 4, 2019, 11:33:52 AM2/4/19

to Uwe L. Korn, Jason Zaman, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

On Mon, Feb 4, 2019 at 5:32 PM Uwe L. Korn <xho...@gmail.com> wrote:

Just as a heads-up: I would like to also join the meeting but am also located in Europe.

I have spent quite some time with the packaging of wheels for pyarrow and turbodbc thus I would like to also give input on this. For Apache Arrow, I see newer manylinux2014 standard as a possible way to go. I'm not so fond of rolloing lib(std)c++ packages inside of pip. It's sadly the case that the features of pip don't allow a good dependency resolution, also with taking CUDA into account, a dependency resolution that differs between source and binary builds of a package. For this case, exactly conda was developed because it was considered out-of-scope for the core Python packaging system. I'm not sure whether we actually can fit all the requirements of the packages that take part in this mail thread into pip without simply reimplementing conda inside of pip.

One question is probably: what would that entail, and why would it be bad? :)

Uwe L. Korn

unread,

Feb 4, 2019, 11:36:56 AM2/4/19

to Manuel Klimek, Jason Zaman, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

I think that problem is whether this would get merged. Conda was created after a meeting with Guido van Rossum and other folks at a PyCon quite some years ago where the final call was that this is not a problem of the core Python ecosystem and that the scientific Python community has to roll their own solution.

@Wes McKinney or someone else: Were you at this meeting and can outline why it was declined back then?

Uwe

Jason Zaman

unread,

Feb 4, 2019, 12:17:23 PM2/4/19

to Uwe L. Korn, Manuel Klimek, Dmitri Gribenko, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

Hm, lets have this SIG-Build meeting as scheduled and then have
another follow-up later probably around 9am PST, 6pm Europe, 1am
Singapore. Does that time work for everyone? (Date TBD).

My take on this whole thing is that it sounds a lot like
re-implementing an entire distro complete with package manager inside
pip just because pip is not sufficient for what we need. My longer
term goal is to fix things up so TensorFlow can just be packaged
directly in distro package repos and most users would go that route.
This would definitely not be a universal solution and we'd still need
to have a pip package anyway. I think we should leave CUDA out of the
discussion initially and see if we can get the cpu-only wheel working
correctly. Hopefully cpu-only is viable on manylinux2014 then we can
tackle CUDA afterwards.

Antoine Pitrou

unread,

Feb 4, 2019, 12:21:30 PM2/4/19

to Uwe L. Korn, Manuel Klimek, Jason Zaman, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

Le 04/02/2019 à 17:36, Uwe L. Korn a écrit :
> I think that problem is whether this would get merged. Conda was created
> after a meeting with Guido van Rossum and other folks at a PyCon quite
> some years ago where the final call was that this is not a problem of
> the core Python ecosystem and that the scientific Python community has
> to roll their own solution.
>

> @Wes McKinney <mailto:wesm...@gmail.com> or someone else: Were you at

> this meeting and can outline why it was declined back then?

I'm not sure anyone in this CC list was at that meeting (I wasn't). If
it's important to have the precise answer, I can try to CC someone.

But I think the general answer is that it's a complex and difficult
endeavour, and the contribution structures inside the Python packaging
ecosystem, where most people are volunteers, didn't allow for it.
There's already enough lag maintaining the current software stack (pip
et al.).

Anaconda then came up and became history, so to speak.

Regards

Antoine.

>
> Uwe
>
> Am Mo., 4. Feb. 2019 um 17:33 Uhr schrieb Manuel Klimek

> <kli...@google.com <mailto:kli...@google.com>>:

>
> On Mon, Feb 4, 2019 at 5:32 PM Uwe L. Korn <xho...@gmail.com
> <mailto:xho...@gmail.com>> wrote:
>
> Just as a heads-up: I would like to also join the meeting but am
> also located in Europe.
>
> I have spent quite some time with the packaging of wheels for
> pyarrow and turbodbc thus I would like to also give input on
> this. For Apache Arrow, I see newer manylinux2014 standard as a
> possible way to go. I'm not so fond of rolloing lib(std)c++
> packages inside of pip. It's sadly the case that the features of
> pip don't allow a good dependency resolution, also with taking
> CUDA into account, a dependency resolution that differs between
> source and binary builds of a package. For this case, exactly
> conda was developed because it was considered out-of-scope for
> the core Python packaging system. I'm not sure whether we
> actually can fit all the requirements of the packages that take
> part in this mail thread into pip without simply reimplementing
> conda inside of pip.
>
>
> One question is probably: what would that entail, and why would it
> be bad? :)
>
>
>
> Uwe
>
> Am Mo., 4. Feb. 2019 um 16:34 Uhr schrieb Jason Zaman

> <ja...@perfinion.com <mailto:ja...@perfinion.com>>:

> >> >> > <mailto:ant...@python.org

> <mailto:build%2Bunsu...@tensorflow.org>.

> >> > Visit this group at
> https://groups.google.com/a/tensorflow.org/group/build/.
> >
> > --
> > You received this message because you are subscribed to
> the Google Groups "SIG Build" group.
> > To unsubscribe from this group and stop receiving emails
> from it, send an email to build+un...@tensorflow.org

> <mailto:build%2Bunsu...@tensorflow.org>.

soumith

unread,

Feb 4, 2019, 12:30:37 PM2/4/19

to Antoine Pitrou, Uwe L. Korn, Manuel Klimek, Jason Zaman, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray...@googlegroups.com, yi...@yifeifeng.com

Unfortunately I'll be on a long flight, and cannot make it to the SIGBuild meeting.

I'm definitely interested in the meeting notes and any follow-up meeting.

> I think we should leave CUDA out of the
discussion initially and see if we can get the cpu-only wheel working
correctly. Hopefully cpu-only is viable on manylinux2014 then we can
tackle CUDA afterwards.

50% of the complexity is in the CUDA packaging.

The other 50% is in shipping a more modern libstdc++.so

I believe we'll make progress if we ignore CUDA, but we'll not address half of the issue.

--
S

Jason Zaman

unread,

Feb 4, 2019, 12:45:46 PM2/4/19

to soumith, Antoine Pitrou, Uwe L. Korn, Manuel Klimek, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray...@googlegroups.com, yi...@yifeifeng.com

On Tue, 5 Feb 2019 at 01:30, soumith <sou...@gmail.com> wrote:
>
> Unfortunately I'll be on a long flight, and cannot make it to the SIGBuild meeting.
> I'm definitely interested in the meeting notes and any follow-up meeting.
>
> > I think we should leave CUDA out of the
> discussion initially and see if we can get the cpu-only wheel working
> correctly. Hopefully cpu-only is viable on manylinux2014 then we can
> tackle CUDA afterwards.
>
> 50% of the complexity is in the CUDA packaging.
> The other 50% is in shipping a more modern libstdc++.so
> I believe we'll make progress if we ignore CUDA, but we'll not address half of the issue.

Yeah, we'll definitely need both to solve it fully. My thinking is
that all packages need at least C++11 but only some need CUDA. Or
might we end up where the libstcc++.so is incompatible with CUDA if we
don't work on everything together?

-- Jason

Robert Nishihara

unread,

Feb 4, 2019, 2:12:37 PM2/4/19

to d...@arrow.apache.org, ja...@perfinion.com, Antoine Pitrou, Uwe L. Korn, Manuel Klimek, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, yi...@yifeifeng.com, soumith

Replying to the thread because the last two messages got dropped.

On Mon, Feb 4, 2019 at 10:00 AM soumith <sou...@gmail.com> wrote:

> I think trying to package CUDA is the wrong way to think about it.
Instead, perhaps you should try to make the package compatible with
system CUDA installs.

I agree in principle.
The problem fundamentally stems from user expectation.

In my ~6+ years of supporting Torch and PyTorch, installing CUDA on a
system can take days, with a user mean approximately half a day. It might
be userland incompetence, or that CUDA is a magical snowflake, but the
reality is that installing CUDA is never great.
So, a huge amount of issues reported by userland are side-effects from
broken CUDA installs.
It doesn't help that the PyPI user expectations of "my package should just
work after a pip install".

If we can reliably install an up-to-date CUDA in a standardized way, and
NVIDIA simply doesn't sidestep the userland issues by saying "user our
docker", or "our PPA is 100% reliable", we would've been in a better state.

Until then, I think it's best that we find a solution for PyPI users that
can work out of box with PyPI.

On Mon, Feb 4, 2019 at 12:52 PM Antoine Pitrou <soli...@pitrou.net> wrote:

> On Tue, 5 Feb 2019 01:45:34 +0800
> Jason Zaman <ja...@perfinion.com> wrote:
> > On Tue, 5 Feb 2019 at 01:30, soumith <sou...@gmail.com> wrote:
> > >
> > > Unfortunately I'll be on a long flight, and cannot make it to the
> SIGBuild meeting.
> > > I'm definitely interested in the meeting notes and any follow-up
> meeting.
> > >
> > > > I think we should leave CUDA out of the
> > > discussion initially and see if we can get the cpu-only wheel working
> > > correctly. Hopefully cpu-only is viable on manylinux2014 then we can
> > > tackle CUDA afterwards.
> > >
> > > 50% of the complexity is in the CUDA packaging.
> > > The other 50% is in shipping a more modern libstdc++.so
> > > I believe we'll make progress if we ignore CUDA, but we'll not address
> half of the issue.
> >
> > Yeah, we'll definitely need both to solve it fully. My thinking is
> > that all packages need at least C++11 but only some need CUDA. Or
> > might we end up where the libstcc++.so is incompatible with CUDA if we
> > don't work on everything together?
>

> I think trying to package CUDA is the wrong way to think about it.
> Instead, perhaps you should try to make the package compatible with
> system CUDA installs.
>
> For example, the Numba pip wheel almost works out-of-the-box with a
> system CUDA install on Ubuntu 18.04. I say "almost" because I had to
> set two environment variables:
> https://github.com/numba/numba/issues/3738
>
> Regards
>
> Antoine.
>
>
>

Manuel Klimek

unread,

Feb 5, 2019, 5:08:26 AM2/5/19

to Robert Nishihara, d...@arrow.apache.org, Jason Zaman, Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, yi...@yifeifeng.com, soumith

On Mon, Feb 4, 2019 at 8:12 PM Robert Nishihara <robertn...@gmail.com> wrote:

Replying to the thread because the last two messages got dropped.

On Mon, Feb 4, 2019 at 10:00 AM soumith <sou...@gmail.com> wrote:
> I think trying to package CUDA is the wrong way to think about it.
Instead, perhaps you should try to make the package compatible with
system CUDA installs.

I agree in principle.
The problem fundamentally stems from user expectation.

In my ~6+ years of supporting Torch and PyTorch, installing CUDA on a
system can take days, with a user mean approximately half a day. It might
be userland incompetence, or that CUDA is a magical snowflake, but the
reality is that installing CUDA is never great.
So, a huge amount of issues reported by userland are side-effects from
broken CUDA installs.
It doesn't help that the PyPI user expectations of "my package should just
work after a pip install".

If we can reliably install an up-to-date CUDA in a standardized way, and
NVIDIA simply doesn't sidestep the userland issues by saying "user our
docker", or "our PPA is 100% reliable", we would've been in a better state.

Until then, I think it's best that we find a solution for PyPI users that
can work out of box with PyPI.

If cuda is the main problem, I'd be happy to try to reach out to nvidia to see whether we can come to a collaboration. I think we'd still need consensus on the toolchain solution first, as that'd be needed to bootstrap the other things :)

Dmitri Gribenko

unread,

Feb 5, 2019, 6:19:17 AM2/5/19

to Manuel Klimek, Jason Zaman, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com, Uwe Korn

On Mon, Feb 4, 2019 at 12:29 PM Manuel Klimek <kli...@google.com> wrote:

+Dmitri Gribenko

Thanks for looping me in, Manuel.

So I wanted to go back to the requirements and enumerate possible solutions.

From soumith's email:

1. CUDA support

2. C++11 support

Neither newest CUDA, nor C++11 work on manylinux1 (CentOS 5.11).

The original email does not go into detail why CUDA does not work, but I can imagine it is because of the old userspace libraries (libc, libstdc++, libpthread etc). C++11 does not work because of an old libstdc++ and old GCC.

So what can we do about old userspace libraries?

Option "Userspace-1": Pip package uses libraries installed on the system where the pip package runs. (AKA the current manylinux approach.)

Advantages:

- Smaller download size.

Disadvantages:

- Pip packages have to be built against an old version of userspace libraries to be maximally-compatible.

- No nice upgrade path. When we need a specific new feature for something (e.g., today it is modern CUDA and C++11), we have to bump the requirements for the host system. We will always be extremely cautious about not bumping the requirements too much, and therefore we will be always stuck with oldest possible libraries that can do the job.

Option "Userspace-2": When the pip package runs, ignore the system userspace libraries. Use libraries from somewhere else.

Advantages:

- We control which versions of userspace libraries we use. We can use libraries that are newer than system ones.

- Complete isolation from the userspace of the system where the pip package runs. The only remaining point of contact with the user's system is the kernel.

Disadvantages:

- We need to figure out where to get these libraries from.

- Bigger download size for users.

So where do we get the userspace libraries from?

Option "Userspace-2a": Pip community owns all userspace libraries that binaries in a pip package can use.

All userspace components defined by manylinux are packaged into a pip package. TensorFlow/PyTorch/... pip packages declare what version of the userspace pip package they depend on.

Advantages:

- Pip community owns all userspace components.

Disadvantages:

- Pip community owns way more stuff than before.

Option "Userpace-2b": Pip takes all userspace libraries from an existing packager.

Same as "Userspace-2a", but instead of owning the build process for the userspace libraries, we take them from an existing packager, for example, Debian, CentOS, Anaconda, Nix package manager, whatever we decide on.

Advantages:

- Pip community controls userspace components.

Disadvantages:

- Pip community owns more stuff than before.

What can we do about old toolchain?

Option "Toolchain-1": Use a toolchain from a certain old distribution, so that the output is maximally-compatible.

This option is compatible with any choice of userspace, as long as the libraries don't require a new compiler or language features.

Disadvantages:

- Ancient toolchain that does not support modern C++.

Option "Toolchain-2": Make a modern toolchain that produces maximally-compatible output.

This option is difficult to implement, since a modern toolchain using a modern C++ version will require a using a contemporary C++ standard library (libc++ or libstdc++).

Option "Toolchain-3": Make a modern toolchain that requires a modern C++ library.

AKA what Manuel is proposing. Package modern libc++ as a wheel, make a Docker container with the corresponding Clang for building binary packages like Tensorflow.

Thoughts?

Dmitri

Uwe L. Korn

unread,

Feb 5, 2019, 8:01:31 AM2/5/19

to Dmitri Gribenko, Manuel Klimek, Jason Zaman, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, JLu...@nvidia.com, Philipp Moritz, ray...@googlegroups.com, Soumith Chintala, yi...@yifeifeng.com

Hello Dimitri,

Option "Userspace-2" sounds for me exactly like the thing that conda does. There is already a community around conda-forge that takes care of packaging all native requirements in separate packages including a modern toolchain that is separate from the host system. I still need to understand, why conda is not an option then? We would just be replicating this setup then.

As previously mentioned, getting conda functionality into pip would be a valid option but we may face the same issues as to when conda was created. I doubt that the PyPA is more open to this scope expansion then they were then. The personell situation is still very limited in the packaging space. For the users of Arrrow, we definitely have had much better experience with users working in conda than those that were using pip, mainly due to the package manager taking care of all the binary dependencies between different packages like arrow, torch and tensorflow.

Also to reiterate a point raised earlier: C++11 with manylinux1 works smoothly. With gcc 4.8.5, everything we need in Arrow supported. C++14 and more are out of scope and can only be used starting with manylinux{2010/2014}.

Uwe

Manuel Klimek

unread,

Feb 5, 2019, 10:22:25 AM2/5/19

to Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com> wrote:

Hello Dimitri,

Option "Userspace-2" sounds for me exactly like the thing that conda does. There is already a community around conda-forge that takes care of packaging all native requirements in separate packages including a modern toolchain that is separate from the host system. I still need to understand, why conda is not an option then? We would just be replicating this setup then.

As previously mentioned, getting conda functionality into pip would be a valid option but we may face the same issues as to when conda was created. I doubt that the PyPA is more open to this scope expansion then they were then. The personell situation is still very limited in the packaging space. For the users of Arrrow, we definitely have had much better experience with users working in conda than those that were using pip, mainly due to the package manager taking care of all the binary dependencies between different packages like arrow, torch and tensorflow.

Also to reiterate a point raised earlier: C++11 with manylinux1 works smoothly. With gcc 4.8.5, everything we need in Arrow supported. C++14 and more are out of scope and can only be used starting with manylinux{2010/2014}.

From the requirements side (Martin will correct me if I'm getting these wrong):

- it seems like from the TF point of view, our users are on pip, so we need to deliver there

- LLVM is going to require C++14 ~in March as far as I can tell

- from trying to find info about manylinux2010 / 14, it seems like these have stalled? (but I'm happy to be proven wrong here :)

Uwe L. Korn

unread,

Feb 5, 2019, 10:28:03 AM2/5/19

to Manuel Klimek, Dmitri Gribenko, Jason Zaman, Antoine Pitrou, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

From the requirements side (Martin will correct me if I'm getting these wrong):
- it seems like from the TF point of view, our users are on pip, so we need to deliver there
- LLVM is going to require C++14 ~in March as far as I can tell
- from trying to find info about manylinux2010 / 14, it seems like these have stalled? (but I'm happy to be proven wrong here :)

Can we start a shared Google Doc to collect all the requirements and constraints?

Antoine Pitrou

unread,

Feb 5, 2019, 10:28:22 AM2/5/19

to Manuel Klimek, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Le 05/02/2019 à 16:22, Manuel Klimek a écrit :
> On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com
> <mailto:xho...@gmail.com>> wrote:
>
> Also to reiterate a point raised earlier: C++11 with manylinux1
> works smoothly. With gcc 4.8.5, everything we need in Arrow
> supported. C++14 and more are out of scope and can only be used
> starting with manylinux{2010/2014}.
>
> From the requirements side (Martin will correct me if I'm getting these
> wrong):
> - it seems like from the TF point of view, our users are on pip, so we
> need to deliver there
> - LLVM is going to require C++14 ~in March as far as I can tell
> - from trying to find info about manylinux2010 / 14, it seems like these
> have stalled? (but I'm happy to be proven wrong here :)

manylinux2010 hasn't stalled, it's been progressing slowly. Apparently
pip 19.0 is out which supports downloading and installing manylinux2010
packages. See status page here:
https://github.com/pypa/manylinux/issues/179#issuecomment-457002180

manylinux2014 is an entirely different question. It needs interested
parties to gather and devise a spec and then get it accepted as a new PEP.

Regards

Antoine.

Manuel Klimek

unread,

Feb 5, 2019, 10:30:01 AM2/5/19

to Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

On Tue, Feb 5, 2019 at 4:28 PM Antoine Pitrou <ant...@python.org> wrote:

Le 05/02/2019 à 16:22, Manuel Klimek a écrit :
> On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com
> <mailto:xho...@gmail.com>> wrote:
>
> Also to reiterate a point raised earlier: C++11 with manylinux1
> works smoothly. With gcc 4.8.5, everything we need in Arrow
> supported. C++14 and more are out of scope and can only be used
> starting with manylinux{2010/2014}.
>
> From the requirements side (Martin will correct me if I'm getting these
> wrong):
> - it seems like from the TF point of view, our users are on pip, so we
> need to deliver there
> - LLVM is going to require C++14 ~in March as far as I can tell
> - from trying to find info about manylinux2010 / 14, it seems like these
> have stalled? (but I'm happy to be proven wrong here :)

manylinux2010 hasn't stalled, it's been progressing slowly. Apparently
pip 19.0 is out which supports downloading and installing manylinux2010
packages. See status page here:
https://github.com/pypa/manylinux/issues/179#issuecomment-457002180

Cool! The problem is that it doesn't solve the C++14 issue, right?

Antoine Pitrou

unread,

Feb 5, 2019, 10:37:47 AM2/5/19

to Manuel Klimek, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Le 05/02/2019 à 16:29, Manuel Klimek a écrit :
>
> manylinux2010 hasn't stalled, it's been progressing slowly. Apparently
> pip 19.0 is out which supports downloading and installing manylinux2010
> packages. See status page here:
> https://github.com/pypa/manylinux/issues/179#issuecomment-457002180
>
> Cool! The problem is that it doesn't solve the C++14 issue, right?

I'm not sure. But apparently this may be the case (due to C++ ABI
issues), if you read this comment and the subsequent ones here:
https://github.com/pypa/manylinux/pull/152#discussion_r167242743

Regards

Antoine.

Jonathan Helmus

unread,

Feb 5, 2019, 11:14:20 AM2/5/19

to Manuel Klimek, Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

On 2/5/19 9:29 AM, 'Manuel Klimek' via TensorFlow Developers wrote:

On Tue, Feb 5, 2019 at 4:28 PM Antoine Pitrou <ant...@python.org> wrote:

Le 05/02/2019 à 16:22, Manuel Klimek a écrit :
> On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com
> <mailto:xho...@gmail.com>> wrote:
>
> Also to reiterate a point raised earlier: C++11 with manylinux1
> works smoothly. With gcc 4.8.5, everything we need in Arrow
> supported. C++14 and more are out of scope and can only be used
> starting with manylinux{2010/2014}.
>
> From the requirements side (Martin will correct me if I'm getting these
> wrong):
> - it seems like from the TF point of view, our users are on pip, so we
> need to deliver there
> - LLVM is going to require C++14 ~in March as far as I can tell
> - from trying to find info about manylinux2010 / 14, it seems like these
> have stalled? (but I'm happy to be proven wrong here :)

manylinux2010 hasn't stalled, it's been progressing slowly. Apparently
pip 19.0 is out which supports downloading and installing manylinux2010
packages. See status page here:
https://github.com/pypa/manylinux/issues/179#issuecomment-457002180

Cool! The problem is that it doesn't solve the C++14 issue, right?

Devtoolset-7 can be installed on RHEL6/CentOS 6 which is the reference distribution of manylinux2010. Devtoolset-7 includes GCC 7.3.1 which has full support for C++14. On RHEL6/CentOS 6 the devtoolset compilers target the older GCC C++ ABI (-D_GLIBCXX_USE_CXX11_ABI=0) and will not emit the newer ABI. There is a open pull request to the manylinux repository to create a docker image containing this toolset which may be of interest:

https://github.com/pypa/manylinux/pull/252

Cheers,

- Jonathan Helmus

manylinux2014 is an entirely different question. It needs interested
parties to gather and devise a spec and then get it accepted as a new PEP.

Regards

Antoine.

--

You received this message because you are subscribed to the Google Groups "TensorFlow Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@tensorflow.org.
Visit this group at https://groups.google.com/a/tensorflow.org/group/developers/.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/developers/CAOsfVvnY5jSsB6aF8qw-d1TFF3XX1OKgCpXC8%3DQ9dyfYbGed_w%40mail.gmail.com.

Philipp Moritz

unread,

Feb 5, 2019, 7:06:34 PM2/5/19

to Jonathan Helmus, Manuel Klimek, Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Thanks for the meeting! One question concerning a point that is still not super clear to me:

Say we define a new manylinux standard based on gcc >=5 (with stable c++11 support). There will still be a lot of wheels form the manylinux1 days that are built against gcc 4.8 that might use the c++11 features before they became stable. How do we prevent bugs from that? Is the plan to convince everybody who uses these c++11 features to use the new manylinux standard?

Jason Zaman

unread,

Feb 5, 2019, 10:52:03 PM2/5/19

to Philipp Moritz, Jonathan Helmus, Manuel Klimek, Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Thanks everyone for attending the meeting! It was great to have people
from so many different groups so we can figure out how to solve this
best for everyone. :)

A lot was discussed, I split the notes from the wheel part of the
discussion out into a separate doc:
https://docs.google.com/document/d/1uYZK2jQtDUPpo3AHe18ZCH1jS9be9s8zR3axLR1SOG0/edit#
It is set to globally commentable so please add any thing that was
missed / incorrect.
We should definitely have a follow up call later on so the folks in
Europe can make it too. Does 19th Feb (Tuesday) 5pm UTC work for
everyone? (9am PST, noon EST, 1am Wednesday Singapore).

Thanks,
Jason

Antoine Pitrou

unread,

Feb 6, 2019, 6:38:50 AM2/6/19

to Philipp Moritz, Jonathan Helmus, Manuel Klimek, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Le 06/02/2019 à 01:06, Philipp Moritz a écrit :
> Thanks for the meeting! One question concerning a point that is still
> not super clear to me:
>
> Say we define a new manylinux standard based on gcc >=5 (with stable
> c++11 support). There will still be a lot of wheels form the manylinux1
> days that are built against gcc 4.8 that might use the c++11 features
> before they became stable. How do we prevent bugs from that? Is the plan
> to convince everybody who uses these c++11 features to use the new
> manylinux standard?

Yes, that's a bit of a problem.

This discussion arised from the incompatibility between Tensorflow
wheels (compiled with a later toolchain) and other Python wheels
(compiled with a manylinux1-compatible toolchain).

Intuitively, by using the new C++ ABI we may prevent such issues when
installing manylinux1 wheels and manylinux20XX wheels side-by-side. But
it's difficult to say for sure.

Regards

Antoine.

>
> On Tue, Feb 5, 2019 at 8:14 AM Jonathan Helmus <jhe...@anaconda.com
> <mailto:jhe...@anaconda.com>> wrote:
>
>
>
> On 2/5/19 9:29 AM, 'Manuel Klimek' via TensorFlow Developers wrote:
>> On Tue, Feb 5, 2019 at 4:28 PM Antoine Pitrou <ant...@python.org
>> <mailto:ant...@python.org>> wrote:
>>
>>
>>
>> Le 05/02/2019 à 16:22, Manuel Klimek a écrit :
>> > On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com
>> <mailto:xho...@gmail.com>

>> <mailto:developers+...@tensorflow.org>.

>> Visit this group at
>> https://groups.google.com/a/tensorflow.org/group/developers/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/tensorflow.org/d/msgid/developers/CAOsfVvnY5jSsB6aF8qw-d1TFF3XX1OKgCpXC8%3DQ9dyfYbGed_w%40mail.gmail.com

>> <https://groups.google.com/a/tensorflow.org/d/msgid/developers/CAOsfVvnY5jSsB6aF8qw-d1TFF3XX1OKgCpXC8%3DQ9dyfYbGed_w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>

Manuel Klimek

unread,

Feb 6, 2019, 8:28:34 AM2/6/19

to Antoine Pitrou, Philipp Moritz, Jonathan Helmus, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

On Wed, Feb 6, 2019 at 12:38 PM Antoine Pitrou <ant...@python.org> wrote:

Le 06/02/2019 à 01:06, Philipp Moritz a écrit :
> Thanks for the meeting! One question concerning a point that is still
> not super clear to me:
>
> Say we define a new manylinux standard based on gcc >=5 (with stable
> c++11 support). There will still be a lot of wheels form the manylinux1
> days that are built against gcc 4.8 that might use the c++11 features
> before they became stable. How do we prevent bugs from that? Is the plan
> to convince everybody who uses these c++11 features to use the new
> manylinux standard?

Yes, that's a bit of a problem.

This discussion arised from the incompatibility between Tensorflow
wheels (compiled with a later toolchain) and other Python wheels
(compiled with a manylinux1-compatible toolchain).

Do you know where these communicate with std types? (due to ABI tagging loading them into the same process should work, right?)

Manuel Klimek

unread,

Feb 6, 2019, 8:34:53 AM2/6/19

to Jonathan Helmus, Antoine Pitrou, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, Philipp Moritz, ray-dev, Soumith Chintala, yi...@yifeifeng.com

On Tue, Feb 5, 2019 at 5:14 PM Jonathan Helmus <jhe...@anaconda.com> wrote:

On 2/5/19 9:29 AM, 'Manuel Klimek' via TensorFlow Developers wrote:

On Tue, Feb 5, 2019 at 4:28 PM Antoine Pitrou <ant...@python.org> wrote:

Le 05/02/2019 à 16:22, Manuel Klimek a écrit :
> On Tue, Feb 5, 2019 at 2:01 PM Uwe L. Korn <xho...@gmail.com
> <mailto:xho...@gmail.com>> wrote:
>
> Also to reiterate a point raised earlier: C++11 with manylinux1
> works smoothly. With gcc 4.8.5, everything we need in Arrow
> supported. C++14 and more are out of scope and can only be used
> starting with manylinux{2010/2014}.
>
> From the requirements side (Martin will correct me if I'm getting these
> wrong):
> - it seems like from the TF point of view, our users are on pip, so we
> need to deliver there
> - LLVM is going to require C++14 ~in March as far as I can tell
> - from trying to find info about manylinux2010 / 14, it seems like these
> have stalled? (but I'm happy to be proven wrong here :)

manylinux2010 hasn't stalled, it's been progressing slowly. Apparently
pip 19.0 is out which supports downloading and installing manylinux2010
packages. See status page here:
https://github.com/pypa/manylinux/issues/179#issuecomment-457002180

Cool! The problem is that it doesn't solve the C++14 issue, right?

Devtoolset-7 can be installed on RHEL6/CentOS 6 which is the reference distribution of manylinux2010. Devtoolset-7 includes GCC 7.3.1 which has full support for C++14. On RHEL6/CentOS 6 the devtoolset compilers target the older GCC C++ ABI (-D_GLIBCXX_USE_CXX11_ABI=0) and will not emit the newer ABI.

I shouldn't read threads backwards, this seems like the answer to my last email, too :) Thanks!

Antoine Pitrou

unread,

Feb 6, 2019, 8:37:19 AM2/6/19

to Manuel Klimek, Philipp Moritz, Jonathan Helmus, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Le 06/02/2019 à 14:27, Manuel Klimek a écrit :
> On Wed, Feb 6, 2019 at 12:38 PM Antoine Pitrou <ant...@python.org
> <mailto:ant...@python.org>> wrote:
>
>
> Le 06/02/2019 à 01:06, Philipp Moritz a écrit :
> > Thanks for the meeting! One question concerning a point that is still
> > not super clear to me:
> >
> > Say we define a new manylinux standard based on gcc >=5 (with stable
> > c++11 support). There will still be a lot of wheels form the
> manylinux1
> > days that are built against gcc 4.8 that might use the c++11 features
> > before they became stable. How do we prevent bugs from that? Is
> the plan
> > to convince everybody who uses these c++11 features to use the new
> > manylinux standard?
>
> Yes, that's a bit of a problem.
>
> This discussion arised from the incompatibility between Tensorflow
> wheels (compiled with a later toolchain) and other Python wheels
> (compiled with a manylinux1-compatible toolchain).
>
>
> Do you know where these communicate with std types? (due to ABI tagging
> loading them into the same process should work, right?)

They don't. I don't remember the specifics, Philipp Moritz might know
more about this.

Regards

Antoine.

Philipp Moritz

unread,

Feb 6, 2019, 12:14:50 PM2/6/19

to Antoine Pitrou, Manuel Klimek, Jonathan Helmus, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

The problems arose if some functionality of C++11 <future> were used. It led to certain symbols being statically linked into the shared library which clashed with other shared libraries that had the same symbols in the same address space, linked against a different version of libstdc++ (specifically, tensorflow's). There is some discussion about this in https://github.com/apache/arrow/pull/3177.

This might happen in the future again if pre g++ 5 stdlib is mixed with post g++ 5. But with manylinux20xx we will be in a better situation if the major packages (TensorFlow, PyTorch, Ray, Arrow) standardize on g++ >= 5. Older manylinux1 packages from pip might still clash but we can flag them as not manylinux20xx compatible and work towards them being fixed.

Philipp Moritz

unread,

Feb 6, 2019, 1:45:50 PM2/6/19

to Antoine Pitrou, Manuel Klimek, Jonathan Helmus, Uwe L. Korn, Dmitri Gribenko, Jason Zaman, Wes McKinney, TensorFlow Developers, Martin Wicke, bu...@tensorflow.org, d...@arrow.apache.org, Edd Wilder-James, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Would building our manylinux2010 wheels against https://github.com/pypa/manylinux/pull/252 solve the C++11 problems? In that case we should just do that. Otherwise let's propose a minimally modified manylinux2011 that fixes C++11 support so we can move on and don't have to wait 9 more months till manylinux2014 or whatever will support c++14.

Jason Zaman

unread,

Feb 18, 2019, 1:04:07 PM2/18/19

to Philipp Moritz, Edd Wilder-James, Antoine Pitrou, Manuel Klimek, Jonathan Helmus, Uwe L. Korn, Dmitri Gribenko, Wes McKinney, TensorFlow Developers, Martin Wicke, SIG Build, d...@arrow.apache.org, Jean-Marc Ludwig, ray-dev, Soumith Chintala, yi...@yifeifeng.com

Hey all,

Just a quick reminder that we're gonna have the follow-up call tomorrow (Tuesday) 5pm UTC, 9am PST, noon EST, 1am Wednesday Singapore. (About 23hrs from this email) so the folks in europe can make the call too.

It'll be a hangouts call same as before and we'll put the link and dial-in number in the google doc:

https://docs.google.com/document/d/1uYZK2jQtDUPpo3AHe18ZCH1jS9be9s8zR3axLR1SOG0/edit#heading=h.7sjot6x53yvw