Hello,
With changes in the CUDA 10.0 headers, VS 2019 headers and TensorFlow build, we’ve been able to build the Tensorflow 2.0 pip package (git tag: v2.0.0-rc0) with VS 2019.
We measured a 14x build time improvement in the CPU build between VS 2017 and VS 2019 16.4 with an additional compiler switch.
While there is a significant win going from VS 2017 to VS 2019 16.3, we found that there were still some files taking a long time to compile in the TensorFlow build. In 16.4, we introduced a new compiler switch (/d2ReducedOptimizeHugeFunctions) where for large functions, we take an alternate, faster optimization path. This further reduced the CPU build time to 1.6 hours. Our heuristic still doesn’t catch the tile_functor_cpu.cc file by default but that is the remaining long pole for MSVC. Other than that file, the remaining long poles are from the NVCC compiler toolchain.
The numbers:
CPU build, on a Cascade Lake machine with Eigen strong inlining enabled:
VS 2017: 23 hours
VS 2019 16.3: 5 hours
VS 2019 16.4: 4.7 hours
VS 2019 16.4 with /d2ReducedOptimizeHugeFunctions***: 1.6 hours
GPU build, on a Ryzen machine with Eigen strong inlining enabled against CUDA 10.0:
VS 2017: 10.5 hours
VS 2019 16.3: 4 hours
VS 2019 16.4: 3.66 hours
Note: It’s possible that the 23 hours we measured as a baseline for the VS 2017 TensorFlow CPU build could have been inflated where something strange happened in our setup, but we still expect large build time wins. We also weren’t able to gather numbers for VS 2019 16.4 with /d2ReducedOptimizeHugeFunctions for the GPU build because in our setup, all TensorFlow builds (with both VS 2017 and 2019) have suddenly started to take significantly longer in the past two weeks.
Apart from the upgrading of Bazel, the workarounds below are
needed because NVIDIA doesn’t support CUDA 10.0 with VS 2019 so the NVCC
toolchain will either emit errors or crash during compilation. Because of the
various errors we encountered, it took us a while to hack around these NVCC toolchain
bugs. Here’s the list of workarounds:
Note that we used the Preview installation of Visual Studio so if you use the official enterprise version, replace the word “Preview” in the paths above with “Enterprise”.
We have a few follow up questions:
Thanks,
Victor
--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.
+Sai Ganesh +Peter Mattson +Pankaj Kanwar[ Please note non-Google employees in the cc list. ]First of all, thank you for the improvements: they look fantastic! We have a relatively new (so far, internal) performance suite to measure end-to-end benchmark performance and ensure regression proofing. As Martin said, we do not currently have Windows benchmarking/regressions setup. The least we should do is make sure that this change does not regress whatever else we are measuring. And we should build a plan to broaden our coverage.Sai, Peter, Pankaj: is there something we can point to Victor for some self-service performance regression proofing with this change?Naveen.
Thanks Victor for the update, the 4.7 hours is a huge win for us.We are using Bazel 0.29.1 currently, if Nvidia does not support VS 2019, we need to see if the toolchain workarounds you mentioned here are something that can be done on our side.Thanks,Goldie
On Fri, Nov 8, 2019 at 12:57 PM Naveen Kumar <nave...@google.com> wrote:
+Sai Ganesh +Peter Mattson +Pankaj Kanwar[ Please note non-Google employees in the cc list. ]First of all, thank you for the improvements: they look fantastic! We have a relatively new (so far, internal) performance suite to measure end-to-end benchmark performance and ensure regression proofing. As Martin said, we do not currently have Windows benchmarking/regressions setup. The least we should do is make sure that this change does not regress whatever else we are measuring. And we should build a plan to broaden our coverage.Sai, Peter, Pankaj: is there something we can point to Victor for some self-service performance regression proofing with this change?Naveen.
On Fri, Nov 8, 2019 at 12:36 PM Martin Wicke <wi...@google.com> wrote:
Thank you so much for this work! The long build time have been a huge pain point for our release process.Regarding your questions:1. Is this issue isolated to CUDA 10.0? Does 10.1 work with VS 2019? We are switching to 10.1 anyway. If the issue persists we can bring this up with NVIDIA as well. +Sanjoy Das +Manuel Klimek FYI.2. I don't believe we currently have a continuous benchmark set up for Windows. +Naveen Kumar what would be the best benchmark code to point to to check performance implications of this kind of change?3. +Goldie Gadde is there anything we need to be able to switch to VS 2019, other than a working build, of course?The shorter build times would certainly be a strong incentive for us to switch, if possible.
Martin
To unsubscribe from this group and stop receiving emails from it, send an email to bu...@tensorflow.org.
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.
T:\tmp\bigvaudl\execroot\org_tensorflow\external\eigen_archive\unsupported\Eigen\CXX11\src/Tensor/TensorExecutor.h(790): error: calling a __host__ function("std::conj<float> ") from a __device__ function("Eigen::internal::EigenMetaKernelEval< ::Eigen::TensorEvaluator<const ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor< ::std::complex<float> , (int)1, (int)1, int> , (int)16, ::Eigen::MakePointer> , const ::Eigen::TensorCwiseUnaryOp< ::Eigen::internal::scalar_conjugate_op< ::std::complex<float> > , const ::Eigen::TensorMap< ::Eigen::Tensor<const ::std::complex<float> , (int)1, (int)1, int> , (int)16, ::Eigen::MakePointer> > > , ::Eigen::GpuDevice> , int, (bool)0> ::run") is not allowed T:\tmp\bigvaudl\execroot\org_tensorflow\external\eigen_archive\unsupported\Eigen\CXX11\src/Tensor/TensorExecutor.h(790): error: identifier "std::conj<float> " is undefined in device code
Would it also be possible for you to share what your Bazel environment and call looks like when building? That would help us narrow things down.
Thanks again! This has made our Windows release process much easier to handle. Austin
--
Hi Austin,
I’m glad to hear that you’re seeing major improvements to your build+test speed for the CPU build.
I did encounter that failure. At the time, I thought it was because of the CUDA 10.0 incompatibility but it looks like it’s affecting 10.1 as well. The temporary workaround I found was to do this:
- Add the "inline" keyword to declaration of conj() in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (around line 1724)
Let me know if that ends up working for you and I can loop in more people to look into the cause. I’m not sure why the error is happening and what the right long term fix is. In case you need it, I believe I also hit a similar error around operator- and the workaround was this:
- Add "inline" keyword to declarations of "complex<_Ty> operator-" in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (there are four instances of this required at lines 1065, 1073, 1080 and 1139)
Because of the workarounds above, you might also get some link errors about multiple definitions of the same function, so you may end up needing to add --linkopt=/FORCE:MULTIPLE to the bazel command.
The bazel environment I used was this:
set BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC
set BAZEL_VC_FULL_VERSION=14.23.28105
set BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview
python .\configure.py (with CUDA set to yes and disabling strong inlining set to no and the rest the defaults)
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true --linkopt=/FORCE:MULTIPLE //tensorflow/tools/pip_package:build_pip_package
Thanks,
Victor
--
You received this message because you are subscribed to the Google Groups "tensorflow-devinfra-team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tensorflow-devinfr...@google.com.
To view this discussion on the web visit https://groups.google.com/a/google.com/d/msgid/tensorflow-devinfra-team/CABBcqdGjBgswMcFc_kRN4iXrqqCVm%3DrxxjX5GWNeP%3Dcx_NOceA%40mail.gmail.com.
I'm ok with a patch like that, if it is small (and hoping that VS19preview6 might include the fixes we require).
On Fri, Nov 22, 2019 at 1:50 PM Sanjoy Das <san...@google.com> wrote:On Fri, Nov 22, 2019 at 1:35 PM Martin Wicke <wi...@google.com> wrote:
> That would take a while, and a bit longer to propagate.Can we have the Eigen fix (if there is one) be part of a workspace.bzl patch file? I realize that's not a great solution, but if unblocking this is a P0 then maybe that's relatively ok?
Austin, could you provide more instructions on how to reproduce this (which branch, build environment/setup, build commands, etc.)?
I will need to follow up with our libraries team to see if these changes are acceptable or if we need an alternate solution.
Thanks,
Victor
Thanks for the instructions, Austin. I was able to reproduce the error. I also spoke with our libraries team and this is likely a bug in Eigen where a function marked with only __device__ is calling into a non-device function. I simulated a fix in Eigen by changing line 585 of the TensorExecutor.h downloaded locally as part of the TensorFlow build to also have the __host__ function attribute:
struct EigenMetaKernelEval {
static __host__ __device__ EIGEN_ALWAYS_INLINE
void run(Evaluator& eval, StorageIndex firstIdx, StorageIndex lastIdx, StorageIndex step_size) {
I did a full clean rebuild of the cwise_op_gpu build and confirmed that the build finishes successfully. I’m building the full pip package now to see if there are any other instances of this that needs to be fixed in Eigen. Assuming we can get Eigen to approve the change, how long would it take TensorFlow to ingest a new version of Eigen? Also, is there a possibility for TensorFlow to avoid using EigenMetaKernelEval/implementing your own if Eigen ends up not accepting this change?
Thanks for the SIG Build meeting invitation. We’ll be calling in so we can chat more about this. See you at 2 pm.
--Victor
I've been struggling with a slew of brittle internal tests. I got a working patch this morning and am trying to submit the Eigen update now.On Fri, Dec 6, 2019 at 10:52 AM Victor Tong <vit...@microsoft.com> wrote:Hi all,
I wanted to follow up to see if the Eigen fix was able to unblock the full GPU TensorFlow pip package build. If you’re able to full pip package, we’d be interested to know the build time improvements you’re seeing between VS 2017 and VS 2019.
Thanks,
Victor
From: Rasmus Munk Larsen <rmla...@google.com>
Sent: Tuesday, December 3, 2019 11:22 AM
To: Martin Wicke <wi...@google.com>
Cc: Victor Tong <vit...@microsoft.com>; Austin Anderson <ange...@google.com>; Goldie Gadde <gga...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Eric Brumer <eri...@microsoft.com>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Subject: Re: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019
Hi Victor,
Thanks for the investigation. I will look into upstreaming your fix to Eigen today.
Rasmus
Hi folks, I'm glad to hear things are working. To reiterate what Victor: said don't hesitate to reach out if there are further issues.
I'd like to ask a favor: is there some public forum where you would be comfortable mentioning the wins you saw? It doesn't have to be a formal blog post or anything, even a tweet to @visualc would be really appreciated... we'd love to hear your thoughts on ease-of-upgrade of the C++ tools, and the compile-time wins.
Thanks,Eric
From: Victor Tong <vit...@microsoft.com>
Sent: Monday, December 9, 2019 2:38 PM
To: Austin Anderson <ange...@google.com>; Goldie Gadde <gga...@google.com>
Cc: Rasmus Munk Larsen <rmla...@google.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Eric Brumer <eri...@microsoft.com>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Subject: RE: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019I’m glad to hear that TensorFlow is able to move to VS 2019. As Austin mentioned, VS 2019 16.4 is an official release (not Preview) that contains the /d2ReducedOptimizeHugeFunctions compiler flag. 16.4 is also the latest servicing baseline that will receive updates over time. See https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes for more details.
If you see any performance regressions in TensorFlow because of this (or in general), let us know and we’d be happy to look into it.
Hi, happy new year everyone!
I wanted to see how things were going with the release & ping you about an update regarding the tweet to @visualc.
Thanks,Eric
From: Austin Anderson <ange...@google.com>
Sent: Thursday, December 12, 2019 2:48 PM
To: Goldie Gadde <gga...@google.com>
Cc: Eric Brumer <eri...@microsoft.com>; Victor Tong <vit...@microsoft.com>; Rasmus Munk Larsen <rmla...@google.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Thanks very much!
From: Rasmus Munk Larsen <rmla...@google.com>
Sent: Thursday, January 23, 2020 2:26 PM
To: Austin Anderson <ange...@google.com>
Cc: Eric Brumer <eri...@microsoft.com>; Goldie Gadde <gga...@google.com>; Victor Tong <vit...@microsoft.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Subject: Re: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019
Woohoo!