OSSFuzz links against libc++ so we cannot statically link libstdc++
directly if we want to not break that.
On Wednesday, 20 March 2019 21:10:54 UTC, Manuel Klimek wrote:Hi builders,one thing that came up in the manylinux discussions, but I don't think I fully understand yet, is the requirement to link to the system libstdc++.Thus, I wanted to ask what people think would be the main problems if TF didn't link to the system libstdc++ in the pip package, and e.g. statically link it, as that would solve a ton of problems for us regarding toolchain flexibility, and given that we don't support things outside of our control calling our C++ API anyway, the question is what the downsides would be.Does anybody know of something that would break?Yes.I don't think this solves anything, does it?If you statically link the pip package to an old (pre-gcc-5) libstdc++ and then try to use that pip package in an application that dynamically links to a newer (post-gcc-5) libstdc++.so then you have a mix of old and new libstdc++ symbols in the same process. Calls that should resolve to the new symbols in the dynamic library might actually use old statically linked symbols, which do something different, or expect a type to have a different layout.
Trying to use versions of GCC before 5.1 to compile C++11 code is not going to give you portable packages that can be used on systems with libstdc++ from a newer GCC. Linking statically doesn't change that as far as I can see.
If you need to use C++11 then my advice is to either stick to versions of GCC where the C++11 support is finished and has a stable ABI, or always recompile everything from source and don't expect binaries to be portable to different systems.
On Thu, Mar 21, 2019 at 12:47 AM <jwa...@redhat.com> wrote:
On Wednesday, 20 March 2019 21:10:54 UTC, Manuel Klimek wrote:Hi builders,one thing that came up in the manylinux discussions, but I don't think I fully understand yet, is the requirement to link to the system libstdc++.Thus, I wanted to ask what people think would be the main problems if TF didn't link to the system libstdc++ in the pip package, and e.g. statically link it, as that would solve a ton of problems for us regarding toolchain flexibility, and given that we don't support things outside of our control calling our C++ API anyway, the question is what the downsides would be.Does anybody know of something that would break?Yes.I don't think this solves anything, does it?If you statically link the pip package to an old (pre-gcc-5) libstdc++ and then try to use that pip package in an application that dynamically links to a newer (post-gcc-5) libstdc++.so then you have a mix of old and new libstdc++ symbols in the same process. Calls that should resolve to the new symbols in the dynamic library might actually use old statically linked symbols, which do something different, or expect a type to have a different layout.The idea is to not expose the symbols of the statically linked libstdc++ - thus, TF code would always use the one it was statically linked with, and other code that links against the system one would use the system one.This would break if we had calls across the boundary of code that is linked with the system lib and code that is statically linked, but I'm not aware of any of this for tensorflow.Trying to use versions of GCC before 5.1 to compile C++11 code is not going to give you portable packages that can be used on systems with libstdc++ from a newer GCC. Linking statically doesn't change that as far as I can see.I'm not sure what you mean by using versions of GCC before 5.1 - I want to be able to use *newer* GCC versions (or other compilers / C++ standard libraries), and deploy to systems with any libstdc++.
We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.
On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.
On Thu, Mar 21, 2019 at 2:25 PM Manuel Klimek <kli...@google.com> wrote:On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.And just to make sure my understanding is correct - as long as this cross-calling happens, a single package switching to devtoolset7 will also break ABI compatibility and lead to crashes, right?
On Thu, Mar 21, 2019 at 2:25 PM Manuel Klimek <kli...@google.com> wrote:On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.And just to make sure my understanding is correct - as long as this cross-calling happens, a single package switching to devtoolset7 will also break ABI compatibility and lead to crashes, right?
On Thursday, 21 March 2019 13:31:55 UTC, Manuel Klimek wrote:On Thu, Mar 21, 2019 at 2:25 PM Manuel Klimek <kli...@google.com> wrote:On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.And just to make sure my understanding is correct - as long as this cross-calling happens, a single package switching to devtoolset7 will also break ABI compatibility and lead to crashes, right?If the other parts use a version of GCC older than 5 and use C++11, yes. In general, you cannot mix C++11 code compiled with GCC 4.x and C++11 code compiled with later versions. Using devtoolset doesn't change that.
Some specific uses of C++11 might work between GCC 4.x and later releases, if the relevant components didn't change between the 4.x version and the 5+ versions that officially support C++11, but other uses don't work. We don't have a complete list of what changed and what didn't, and what works and what doesn't (because the party line is just "it's not supported, don't do it") but https://gcc.gnu.org/wiki/Cxx11AbiCompatibility mentions a few specific incompatibilities.
We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.
Are we sure that flipping the C++ standard wouldn't introduce any of those symbols in the C++ API?
On Thursday, 21 March 2019 14:05:10 UTC, Manuel Klimek wrote:On Thu, Mar 21, 2019 at 2:35 PM Uwe L. Korn <ma...@uwekorn.com> wrote:On Thu, Mar 21, 2019, at 2:31 PM, Manuel Klimek wrote:On Thu, Mar 21, 2019 at 2:25 PM Manuel Klimek <kli...@google.com> wrote:On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.And just to make sure my understanding is correct - as long as this cross-calling happens, a single package switching to devtoolset7 will also break ABI compatibility and lead to crashes, right?That should work as it still is using the old CXXABI?I thought it was linker-script-patching symbols so that the common symbols with the old libstdc++ come from the .so, while the symbols not available are statically linked.Right.Are we sure that flipping the C++ standard wouldn't introduce any of those symbols in the C++ API?I'm not sure I understand exactly what you're asking here, but in devtoolset the new cxx11 ABI (i.e. new definitions of std::string and std::list, and a few other changes) is completely disabled.
Using -std=c++11 or -std=c++14 with devtoolset doesn't introduce the new cxx11 ABI types. (It doesn't introduce them even with non-devtoolset compilers either, because the choice of old or new ABI is orthogonal to the choice of -std, as https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html says, but with non-devtoolset compilers you can explicitly choose the ABI via a macro, but that macro does nothing for devtoolset).
--
On Thu, Mar 21, 2019 at 4:02 PM <jwa...@redhat.com> wrote:
On Thursday, 21 March 2019 14:05:10 UTC, Manuel Klimek wrote:On Thu, Mar 21, 2019 at 2:35 PM Uwe L. Korn <ma...@uwekorn.com> wrote:On Thu, Mar 21, 2019, at 2:31 PM, Manuel Klimek wrote:On Thu, Mar 21, 2019 at 2:25 PM Manuel Klimek <kli...@google.com> wrote:On Thu, 21 Mar 2019, 11:39 Uwe L. Korn, <ma...@uwekorn.com> wrote:We have a dependency between `pyarrow` and tensorflow-io at the moment for https://github.com/tensorflow/io/tree/master/tensorflow_io/arrow and https://github.com/tensorflow/io/tree/master/tensorflow_io/parquet There we are calling the C++ API of both.Also I think that the Ray project also calls the C++ API of both, Arrow and Tensorflow. This is how most of this discussion started.Thanks! I thought that the only supported way to do this would be to build all of the C++ parts within the same bazel invocation, as afaiu TF doesn't make any guarantees about C++ ABI stability.And just to make sure my understanding is correct - as long as this cross-calling happens, a single package switching to devtoolset7 will also break ABI compatibility and lead to crashes, right?That should work as it still is using the old CXXABI?I thought it was linker-script-patching symbols so that the common symbols with the old libstdc++ come from the .so, while the symbols not available are statically linked.Right.Are we sure that flipping the C++ standard wouldn't introduce any of those symbols in the C++ API?I'm not sure I understand exactly what you're asking here, but in devtoolset the new cxx11 ABI (i.e. new definitions of std::string and std::list, and a few other changes) is completely disabled.So if I have gcc 4 based C++11 enabled binaries that do not use the cxx11 ABI and use them with devtoolset7, you're saying the diffs in ABI you link above don't apply?
What you said above read to me like the opposite, but I'm probably thoroughly confused now :)A different problem I see is that the TF C++ types at the call boundaries themselves might change if we flip the C++ standard to C++14.
The package in tensorflow-io is part of the effort for a modular tensorflow. Prior to TF 2.0, tensorflow bundles many functionalities in tf.contrib that requires lots of third-party library linkages (e.g. Kafka, AWS, Apache Arrow, Apache Ignite, etc.). That makes tensorflow’s build time very long (several hours), as many third-party libraries have a big C++/C code base.
Once TF 2.0 is released, the tf.contrib will be removed completely, and many functionalities are shifted to different SIG's repo such as tensorflow-io, addon, or sig networking. For tensorflow-io, it is not in high usage yet as most of the features could still be used from tf.contrib (as part of the tensorflow pip package). However, once TF 2.0 is released users will need to change from tf.contrib (part of tensorflow pip package) to tensorflow-io. One example is the tensorflow-datasets package, at the moment tensorflow-datasets uses tf.contrib.lmdb for LMDB format support. That will need to be re-pointed to tensorflow_io.lmdb in TF 2.0.
On a separate note, I think for extension packages of tensorflow such as tensorflow-io (and to an extent, tensorflow-addon and others), could always try to build and match exactly the same C++ compiler used by tensorflow itself (we use gcc 4.8 and on ubuntu 14.04 in tensorflow-io pip package for that exact reason). It might post some challenges to require all repos (inside tensorflow org, or outside) linking against tensorflow with exactly the same version as used by tensorflow core repo.
so, to conclude all thisIf I wanted to do something that has the chance of minimal breakage, I'd probably want to try devtoolset-7.Given that TF releases are currently ubuntu 14.04 based, and devtoolset-7 seems to be a centos thing, what's the best way to approach this? Will compiling with devtoolset-7 on a newer centos lead to backwards-compatible-enough binaries?
Ah, ok, and that's why even with devtoolset7 I won't get a manylinux1 compliant wheel.Why can't I install a dev version of an older glibc and link against that?Thanks everyone, this is insightful :)
From TF side, our goal is to use centos6 + devtoolset7, and then dlopen cuda if available to satisfy all manylinux2010 constraints.I do not think we have a chance to satisfy manylinux1 constraints.
On Tue, Mar 26, 2019 at 10:18 PM 'Gunhan Gulsoy' via SIG Build <bu...@tensorflow.org> wrote:From TF side, our goal is to use centos6 + devtoolset7, and then dlopen cuda if available to satisfy all manylinux2010 constraints.I do not think we have a chance to satisfy manylinux1 constraints.Ok, so from what I understand to satisfy manylinux1 constraints we'd need a devtoolset-7 that has a libstdc++_nonshared that would work with a libstdc++ 4.2, which we don't have. My confusion came from that manylinux-docker ships with gcc 4.8, but that already does the same trick devtoolset-7 does.
--
Trying to build a centos6 based image. Unfortunately this is not possible from my debian workstation so I have to jump through hoops to push it all onto some non-debian cloud instance :(
That looks like a wheel built off CentOS 7 and devtoolset-7. Are you sure you used CentOS 6?
Ok, the next problem that I should have foreseen is that we can't remote build with centos6, because the RBE machines run some form of debian :(
On Thu, Mar 28, 2019 at 8:33 PM Manuel Klimek <kli...@google.com> wrote:Ok, the next problem that I should have foreseen is that we can't remote build with centos6, because the RBE machines run some form of debian :(That one might have been too early - the autoconfig doesn't work (that usually runs locally) but I can run it on a remote machine and copy stuff back. We'll see whether the actions actually all work remotely.
bash-4.1# auditwheel show /input/tensorflow-1.13.1-cp27-none-linux_x86_64.whltensorflow-1.13.1-cp27-none-linux_x86_64.whl is consistent with thefollowing platform tag: "manylinux2010_x86_64".
The wheel references external versioned symbols in these system-
provided shared libraries: libc.so.6 with versions {'GLIBC_2.2.5','GLIBC_2.4', 'GLIBC_2.10', 'GLIBC_2.6', 'GLIBC_2.3.2', 'GLIBC_2.9','GLIBC_2.11', 'GLIBC_2.3.4', 'GLIBC_2.7', 'GLIBC_2.3'}, libstdc++.so.6with versions {'GLIBCXX_3.4.9', 'CXXABI_1.3.3', 'CXXABI_1.3.2','GLIBCXX_3.4.11', 'GLIBCXX_3.4.10', 'CXXABI_1.3', 'GLIBCXX_3.4'},libpthread.so.0 with versions {'GLIBC_2.3.3', 'GLIBC_2.12','GLIBC_2.2.5', 'GLIBC_2.3.2'}, libm.so.6 with versions{'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0', 'GCC_3.3'},libdl.so.2 with versions {'GLIBC_2.2.5'}, librt.so.1 with versions{'GLIBC_2.2.5'}
This constrains the platform tag to "manylinux2010_x86_64". In orderto achieve a more compatible tag, you would need to recompile a newwheel from source on a system with earlier versions of theselibraries, such as a recent manylinux image.