Building TensorFlow 2.0 with CUDA 10.0 and VS 2019

Victor Tong

unread,

Nov 8, 2019, 2:48:47 PM11/8/19

to SIG Build, eri...@microsoft.com, ki...@microsoft.com, vit...@microsoft.com, ruz...@microsoft.com, mag...@microsoft.com, dah...@microsoft.com

Hello,

With changes in the CUDA 10.0 headers, VS 2019 headers and TensorFlow build, we’ve been able to build the Tensorflow 2.0 pip package (git tag: v2.0.0-rc0) with VS 2019.

We measured a 14x build time improvement in the CPU build between VS 2017 and VS 2019 16.4 with an additional compiler switch.

While there is a significant win going from VS 2017 to VS 2019 16.3, we found that there were still some files taking a long time to compile in the TensorFlow build. In 16.4, we introduced a new compiler switch (/d2ReducedOptimizeHugeFunctions) where for large functions, we take an alternate, faster optimization path. This further reduced the CPU build time to 1.6 hours. Our heuristic still doesn’t catch the tile_functor_cpu.cc file by default but that is the remaining long pole for MSVC. Other than that file, the remaining long poles are from the NVCC compiler toolchain.

The numbers:

CPU build, on a Cascade Lake machine with Eigen strong inlining enabled:

VS 2017: 23 hours

VS 2019 16.3: 5 hours

VS 2019 16.4: 4.7 hours

VS 2019 16.4 with /d2ReducedOptimizeHugeFunctions***: 1.6 hours

GPU build, on a Ryzen machine with Eigen strong inlining enabled against CUDA 10.0:

VS 2017: 10.5 hours

VS 2019 16.3: 4 hours

VS 2019 16.4: 3.66 hours

Note: It’s possible that the 23 hours we measured as a baseline for the VS 2017 TensorFlow CPU build could have been inflated where something strange happened in our setup, but we still expect large build time wins. We also weren’t able to gather numbers for VS 2019 16.4 with /d2ReducedOptimizeHugeFunctions for the GPU build because in our setup, all TensorFlow builds (with both VS 2017 and 2019) have suddenly started to take significantly longer in the past two weeks.

Apart from the upgrading of Bazel, the workarounds below are needed because NVIDIA doesn’t support CUDA 10.0 with VS 2019 so the NVCC toolchain will either emit errors or crash during compilation. Because of the various errors we encountered, it took us a while to hack around these NVCC toolchain bugs. Here’s the list of workarounds:

Copy the cudafe++.exe from a CUDA 10.1 installation into the CUDA 10.0 installation folder.
Use Bazel 0.28.1 instead of 0.26.1. Otherwise you will hit a bunch of docker errors complaining about .bzl files.
Replace 1920 with 1930 in #error in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include\crt\host_config.h (Roughly line 141)
Remove the "inline" keyword from the _Throw_bad_array_new_length function in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\exception (Roughly line 323)
Add the "inline" keyword to declaration of conj() in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (around line 1724)
Add "inline" keyword to declarations of "complex<_Ty> operator-" in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (there are four instances of this required at lines 1065, 1073, 1080 and 1139)
Compile TensorFlow with --linkopt=/FORCE:MULTIPLE

Note that we used the Preview installation of Visual Studio so if you use the official enterprise version, replace the word “Preview” in the paths above with “Enterprise”.

We have a few follow up questions:

We've previously spoken to NVIDIA about supporting VS 2019 in CUDA 10.0 but they have pushed back. We would be interested in working with the TensorFlow team to reopen the conversation with NVIDIA. Would the TensorFlow team be interested in this?
Is there a way to measure TensorFlow correctness and performance? We’d like to measure the code quality impact of the /d2ReducedOptimizeHugeFunctions flag.
Apart from getting NVIDIA to support VS 2019 and CUDA 10.0, is there anything we can do to help TensorFlow migrate to VS 2019?

Thanks,

Victor

Martin Wicke

unread,

Nov 8, 2019, 3:36:24 PM11/8/19

to Victor Tong, Naveen Kumar, Sanjoy Das, Manuel Klimek, Goldie Gadde, SIG Build, eri...@microsoft.com, ki...@microsoft.com, vit...@microsoft.com, ruz...@microsoft.com, mag...@microsoft.com, dah...@microsoft.com

Thank you so much for this work! The long build time have been a huge pain point for our release process.

Regarding your questions:

1. Is this issue isolated to CUDA 10.0? Does 10.1 work with VS 2019? We are switching to 10.1 anyway. If the issue persists we can bring this up with NVIDIA as well. +Sanjoy Das +Manuel Klimek FYI.

2. I don't believe we currently have a continuous benchmark set up for Windows. +Naveen Kumar what would be the best benchmark code to point to to check performance implications of this kind of change?

3. +Goldie Gadde is there anything we need to be able to switch to VS 2019, other than a working build, of course?

The shorter build times would certainly be a strong incentive for us to switch, if possible.

Martin

--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.

Goldie Gadde

unread,

Nov 8, 2019, 4:18:32 PM11/8/19

to Naveen Kumar, Martin Wicke, Sai Ganesh, Peter Mattson, Pankaj Kanwar, Victor Tong, Sanjoy Das, Manuel Klimek, SIG Build, eri...@microsoft.com, ki...@microsoft.com, vit...@microsoft.com, ruz...@microsoft.com, mag...@microsoft.com, dah...@microsoft.com

Thanks Victor for the update, the 4.7 hours is a huge win for us.

We are using Bazel 0.29.1 currently, if Nvidia does not support VS 2019, we need to see if the toolchain workarounds you mentioned here are something that can be done on our side.

Thanks,

Goldie

On Fri, Nov 8, 2019 at 12:57 PM Naveen Kumar <nave...@google.com> wrote:

+Sai Ganesh +Peter Mattson +Pankaj Kanwar

[ Please note non-Google employees in the cc list. ]

First of all, thank you for the improvements: they look fantastic! We have a relatively new (so far, internal) performance suite to measure end-to-end benchmark performance and ensure regression proofing. As Martin said, we do not currently have Windows benchmarking/regressions setup. The least we should do is make sure that this change does not regress whatever else we are measuring. And we should build a plan to broaden our coverage.

Sai, Peter, Pankaj: is there something we can point to Victor for some self-service performance regression proofing with this change?

Naveen.

Victor Tong

unread,

Nov 8, 2019, 5:16:41 PM11/8/19

to SIG Build

>>> Is this issue isolated to CUDA 10.0? Does 10.1 work with VS 2019? We are switching to 10.1 anyway. If the issue persists we can bring this up with NVIDIA as well.

I had been testing against CUDA 10.0 because from our previous conversations, CUDA 10.1 required a driver update which could be problematic so I didn't pursue that path. I'm excited to hear that TensorFlow is switching to 10.1 because CUDA 10.1 with VS 2019 is a supported scenario. See https://devblogs.microsoft.com/cppblog/cuda-10-1-available-now-with-support-for-latest-microsoft-visual-studio-2019-versions/ for more information.

>>> We are using Bazel 0.29.1 currently, if Nvidia does not support VS 2019, we need to see if the toolchain workarounds you mentioned here are something that can be done on our side.

With Bazel 0.29.1 and the move to CUDA 10.1, I'm expecting the workarounds in the original email to no longer be required to switch to VS 2019. If you encounter any problems migrating to CUDA 10.1 and VS 2019, feel free to let us (the MSVC team) know and we can loop in the right people (either on MSVC or from NVIDIA).

--Victor

On Friday, November 8, 2019 at 1:18:32 PM UTC-8, Goldie Gadde wrote:

Thanks Victor for the update, the 4.7 hours is a huge win for us.

We are using Bazel 0.29.1 currently, if Nvidia does not support VS 2019, we need to see if the toolchain workarounds you mentioned here are something that can be done on our side.
Thanks,
Goldie

On Fri, Nov 8, 2019 at 12:57 PM Naveen Kumar <nave...@google.com> wrote:

+Sai Ganesh +Peter Mattson +Pankaj Kanwar

[ Please note non-Google employees in the cc list. ]

First of all, thank you for the improvements: they look fantastic! We have a relatively new (so far, internal) performance suite to measure end-to-end benchmark performance and ensure regression proofing. As Martin said, we do not currently have Windows benchmarking/regressions setup. The least we should do is make sure that this change does not regress whatever else we are measuring. And we should build a plan to broaden our coverage.

Sai, Peter, Pankaj: is there something we can point to Victor for some self-service performance regression proofing with this change?

Naveen.

On Fri, Nov 8, 2019 at 12:36 PM Martin Wicke <wi...@google.com> wrote:

Thank you so much for this work! The long build time have been a huge pain point for our release process.

Regarding your questions:

1. Is this issue isolated to CUDA 10.0? Does 10.1 work with VS 2019? We are switching to 10.1 anyway. If the issue persists we can bring this up with NVIDIA as well. +Sanjoy Das +Manuel Klimek FYI.

2. I don't believe we currently have a continuous benchmark set up for Windows. +Naveen Kumar what would be the best benchmark code to point to to check performance implications of this kind of change?

3. +Goldie Gadde is there anything we need to be able to switch to VS 2019, other than a working build, of course?

The shorter build times would certainly be a strong incentive for us to switch, if possible.

Martin

To unsubscribe from this group and stop receiving emails from it, send an email to bu...@tensorflow.org.

Goldie Gadde

unread,

Nov 11, 2019, 2:03:15 PM11/11/19

to Victor Tong, SIG Build

Thanks a lot Victor for confirming that Bazel 0.29.1 and CUDA 10.1 versions are officially supported. We will switch over to VS 2019 in the next couple of months as the Build time improvement is of huge value for us.

Will reach out to you if we have questions or run into any issues during the move the VS 2019.

- Goldie

To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.

Austin Anderson

unread,

Nov 21, 2019, 6:40:20 PM11/21/19

to Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Victor Tong, Rui Zhang, mag...@microsoft.com, dah...@microsoft.com, tensorflow-devinfra-team, Günhan Gülsoy, Goldie Gadde

Hey Victor,

Thanks a lot for this. Your work has helped us out a lot -- we rolled out these changes to our internal CI images and we've already seen a massive improvement in build+test speed for CPU, down from 24+ hours to under 4.

GPU seems faster as well, but we've gotten stuck on some compilation errors. Has your team seen anything like this in connection to VS 2019 changes? This could be something that's new on our 2.1 branch, but I wanted to make sure it's not something you'd seen before:

T:\tmp\bigvaudl\execroot\org_tensorflow\external\eigen_archive\unsupported\Eigen\CXX11\src/Tensor/TensorExecutor.h(790): error: calling a __host__ function("std::conj<float> ") from a __device__ function("Eigen::internal::EigenMetaKernelEval< ::Eigen::TensorEvaluator<const  ::Eigen::TensorAssignOp< ::Eigen::TensorMap< ::Eigen::Tensor<    ::std::complex<float> , (int)1, (int)1, int> , (int)16,  ::Eigen::MakePointer> , const  ::Eigen::TensorCwiseUnaryOp< ::Eigen::internal::scalar_conjugate_op<    ::std::complex<float> > , const  ::Eigen::TensorMap< ::Eigen::Tensor<const     ::std::complex<float> , (int)1, (int)1, int> , (int)16,  ::Eigen::MakePointer> > > ,  ::Eigen::GpuDevice> , int, (bool)0> ::run") is not allowed
T:\tmp\bigvaudl\execroot\org_tensorflow\external\eigen_archive\unsupported\Eigen\CXX11\src/Tensor/TensorExecutor.h(790): error: identifier "std::conj<float> " is undefined in device code

Would it also be possible for you to share what your Bazel environment and call looks like when building? That would help us narrow things down.

Thanks again! This has made our Windows release process much easier to handle.
Austin

--

Günhan Gülsoy

unread,

Nov 22, 2019, 2:20:27 PM11/22/19

to Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team, Goldie Gadde

Hi Victor,

We can certainly experiment this way, but I do not think I want to modify visual studio for the software we are distributing.

Especially if people cannot build TF with visual studio and CUDA out of the box, we should document that, too.

On Fri, Nov 22, 2019 at 10:52 AM Victor Tong <vit...@microsoft.com> wrote:

Hi Austin,

I’m glad to hear that you’re seeing major improvements to your build+test speed for the CPU build.

I did encounter that failure. At the time, I thought it was because of the CUDA 10.0 incompatibility but it looks like it’s affecting 10.1 as well. The temporary workaround I found was to do this:

Add the "inline" keyword to declaration of conj() in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (around line 1724)

Let me know if that ends up working for you and I can loop in more people to look into the cause. I’m not sure why the error is happening and what the right long term fix is. In case you need it, I believe I also hit a similar error around operator- and the workaround was this:

Add "inline" keyword to declarations of "complex<_Ty> operator-" in C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.23.28105\include\complex (there are four instances of this required at lines 1065, 1073, 1080 and 1139)

Because of the workarounds above, you might also get some link errors about multiple definitions of the same function, so you may end up needing to add --linkopt=/FORCE:MULTIPLE to the bazel command.

The bazel environment I used was this:

set BAZEL_VC=C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC

set BAZEL_VC_FULL_VERSION=14.23.28105

set BAZEL_VS=C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview

python .\configure.py (with CUDA set to yes and disabling strong inlining set to no and the rest the defaults)

bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true --linkopt=/FORCE:MULTIPLE //tensorflow/tools/pip_package:build_pip_package

Thanks,

Victor

Goldie Gadde

unread,

Nov 22, 2019, 3:53:03 PM11/22/19

to Günhan Gülsoy, Sanjoy Das, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Adding +Sanjoy Das to the thread.

Sanjoy Das

unread,

Nov 22, 2019, 4:02:34 PM11/22/19

to Goldie Gadde, Günhan Gülsoy, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Will it be easier to modify Eigen's source code (guarded by a macro
perhaps) instead of VS2019's standard library? E.g. maybe we could
just reimplement std::conj in Eigen?

-- Sanjoy

Martin Wicke

unread,

Nov 22, 2019, 4:35:40 PM11/22/19

to Sanjoy Das, Goldie Gadde, Günhan Gülsoy, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

That would take a while, and a bit longer to propagate.

Can I have a short summary? Specifically, can we still build with older VS, even slowly? Or are we hard blocked on resolving this?

--
You received this message because you are subscribed to the Google Groups "tensorflow-devinfra-team" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tensorflow-devinfr...@google.com.
To view this discussion on the web visit https://groups.google.com/a/google.com/d/msgid/tensorflow-devinfra-team/CABBcqdGjBgswMcFc_kRN4iXrqqCVm%3DrxxjX5GWNeP%3Dcx_NOceA%40mail.gmail.com.

Sanjoy Das

unread,

Nov 22, 2019, 4:50:15 PM11/22/19

to Martin Wicke, Goldie Gadde, Günhan Gülsoy, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

On Fri, Nov 22, 2019 at 1:35 PM Martin Wicke <wi...@google.com> wrote:
> That would take a while, and a bit longer to propagate.

Can we have the Eigen fix (if there is one) be part of a workspace.bzl patch file? I realize that's not a great solution, but if unblocking this is a P0 then maybe that's relatively ok?

> Can I have a short summary?

> Specifically, can we still build with older VS, even slowly? Or are we hard blocked on resolving this?

In another thread Goldie said that with VS2017 the build hasn't finished even after 48 hours.

Martin Wicke

unread,

Nov 22, 2019, 4:51:54 PM11/22/19

to Sanjoy Das, Goldie Gadde, Günhan Gülsoy, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

I'm ok with a patch like that, if it is small (and hoping that VS19preview6 might include the fixes we require).

Sanjoy Das

unread,

Nov 22, 2019, 5:12:28 PM11/22/19

to Martin Wicke, Goldie Gadde, Günhan Gülsoy, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

On Fri, Nov 22, 2019 at 1:51 PM Martin Wicke <wi...@google.com> wrote:

I'm ok with a patch like that, if it is small (and hoping that VS19preview6 might include the fixes we require).

Ok. Austin, Goldie, what do you think? We need to find the problematic place in Eigen that calls into std::conj and instead have it call a custom eigen_conj implementation.

On Fri, Nov 22, 2019 at 1:50 PM Sanjoy Das <san...@google.com> wrote:
On Fri, Nov 22, 2019 at 1:35 PM Martin Wicke <wi...@google.com> wrote:
> That would take a while, and a bit longer to propagate.

Can we have the Eigen fix (if there is one) be part of a workspace.bzl patch file? I realize that's not a great solution, but if unblocking this is a P0 then maybe that's relatively ok?

I didn't notice that we have a public mailing list on CC. I was referring to this (publicly visible link).

-- Sanjoy

Austin Anderson

unread,

Nov 22, 2019, 5:32:28 PM11/22/19

to Sanjoy Das, Martin Wicke, Goldie Gadde, Günhan Gülsoy, Victor Tong, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

I can also confirm that applying Victor's suggested changes (five additions of "inline" within Visual Studio's "complex" file, as well as /FORCE:MULTIPLE) allowed the failing target to compile successfully on our build images. I'm building build_pip_package as well to verify that there are no other failures, but that will take a few hours at least.

Günhan Gülsoy

unread,

Nov 22, 2019, 5:33:10 PM11/22/19

to Sanjoy Das, Martin Wicke, Goldie Gadde, Victor Tong, Austin Anderson, Victor Tong, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

I think the fastest solution right now is to build TF with visual studio 2017, with eigen_strong_inline turned off.

The produced library will be slightly slower, but that will require no code changes.

We should then look into Eigen changes, and a new visual studio.

Goldie Gadde

unread,

Nov 25, 2019, 12:41:44 PM11/25/19

to Günhan Gülsoy, Sanjoy Das, Martin Wicke, Victor Tong, Austin Anderson, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Hi all,

We were able to proceed with the BUILD by turning of `eigen_strong_inline`, but we need to fix this.

Austin please let us know if you were able to build successfully with the suggested changes.

Victor,

The current solution of making changes to Visual Studio file is not something we can do, is it possible to make those changes on your end and provide use with a new preview release, and also make sure these changes are incorporated into the final release.

Thanks,

Goldie

Austin Anderson

unread,

Dec 2, 2019, 6:19:02 PM12/2/19

to Victor Tong, Goldie Gadde, Günhan Gülsoy, Sanjoy Das, Martin Wicke, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Victor,

Sure thing. Here's what I did:

Create a fresh VM on GCP with Windows Server 2019. I used this: gcloud compute instances create angerson-build-tf --image-project windows-cloud --image-family=windows-2019-for-containers --zone us-central1-a --machine-type n1-standard-64 --boot-disk-size 300
On the VM: (roughly following https://www.tensorflow.org/install/source_windows but with newer configs)

Install Visual Studio Community 2019 Preview 16.5 (the latest) with this .vsconfig.
Install CUDA 10.1 update2 for Windows 10
Install CuDNN 7.6.5.32 for CUDA 10.1 by unpacking it into the cuda directory.
Install Python 3.7.5 for Windows via the standard installer, including Pip and PATH settings
Install MSYS2 20190524 (the latest). Run "pacman -Syu" until it stops finding new packages, then "pacman -S git patch unzip". Add the usr/bin directory to the Path.
Install bazelisk 1.1.0 for Windows and add its directory to the Path
I did not need to set any Bazel VC variables because it found the single VS installation.

Reboot and fix paths if needed (I had to make sure my Windows system paths included

Run "pip install six numpy wheel"
Run "pip install pip3 install keras_applications==1.0.6 keras_preprocessing==1.0.5 --no-deps"

Run bash from cmd
"git clone http://github.com/tensorflow/tensorflow" (my exact commit: 22abc2772ab2d399bda8b122dae2fef99b62c29c)
"cd tensorflow"
"python ./configure.py", enable cuda and don't override inlining, as your suggestion did (logs)
Add "build --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions" to .tf_configure.bazelrc (full .tf_configure.bazelrc)
"bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true --config=v2 //tensorflow/core/kernels:cwise_op_gpu" raises these operator errors. (full log) It only takes two minutes on a 64-core machine, though! I think this target demonstrates the only kinds of failures; to double-check we'd need to build build_pip_package.
From that point, I could resolve those errors by following your instructions to update the "complex" files -- at least in our full images. I don't have time this afternoon to make the same changes in the new image I created, but I can verify again if needed.

Let me know if I left anything out.

Also, the monthly SIG Build meeting is coming up tomorrow (Tuesday) afternoon at 2pm PST, and it would be really helpful if you could be there to discuss more about this. Here's the rest of the details: bit.ly/tf-sig-build-notes

Austin

On Mon, Dec 2, 2019 at 11:39 AM Victor Tong <vit...@microsoft.com> wrote:

Austin, could you provide more instructions on how to reproduce this (which branch, build environment/setup, build commands, etc.)?

I will need to follow up with our libraries team to see if these changes are acceptable or if we need an alternate solution.

Thanks,

Victor

Martin Wicke

unread,

Dec 3, 2019, 2:09:54 PM12/3/19

to Victor Tong, Rasmus Larsen, Austin Anderson, Goldie Gadde, Günhan Gülsoy, Sanjoy Das, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

+Rasmus Larsen

On Tue, Dec 3, 2019, 10:56 Victor Tong <vit...@microsoft.com> wrote:

Thanks for the instructions, Austin. I was able to reproduce the error. I also spoke with our libraries team and this is likely a bug in Eigen where a function marked with only __device__ is calling into a non-device function. I simulated a fix in Eigen by changing line 585 of the TensorExecutor.h downloaded locally as part of the TensorFlow build to also have the __host__ function attribute:

struct EigenMetaKernelEval {

static __host__ __device__ EIGEN_ALWAYS_INLINE

void run(Evaluator& eval, StorageIndex firstIdx, StorageIndex lastIdx, StorageIndex step_size) {

I did a full clean rebuild of the cwise_op_gpu build and confirmed that the build finishes successfully. I’m building the full pip package now to see if there are any other instances of this that needs to be fixed in Eigen. Assuming we can get Eigen to approve the change, how long would it take TensorFlow to ingest a new version of Eigen? Also, is there a possibility for TensorFlow to avoid using EigenMetaKernelEval/implementing your own if Eigen ends up not accepting this change?

Thanks for the SIG Build meeting invitation. We’ll be calling in so we can chat more about this. See you at 2 pm.

--Victor

Goldie Gadde

unread,

Dec 9, 2019, 12:19:07 PM12/9/19

to Rasmus Munk Larsen, Victor Tong, Martin Wicke, Austin Anderson, Günhan Gülsoy, Sanjoy Das, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

HI Folks,

Wanted to provide an update on the Windows builds.

Thanks to Victor, Austin and Rasmus , we were able to get the Windows builds passing with MSVS 2019 (16.4) and the Eigen patch.

Victor,
thanks for making a new VS 2019 preview release and helping us in getting this to work.

Could you also make sure that these changes are part of the final release as well.

Thanks,

Goldie

On Fri, Dec 6, 2019 at 11:13 AM Rasmus Munk Larsen <rmla...@google.com> wrote:

I've been struggling with a slew of brittle internal tests. I got a working patch this morning and am trying to submit the Eigen update now.

On Fri, Dec 6, 2019 at 10:52 AM Victor Tong <vit...@microsoft.com> wrote:

Hi all,

I wanted to follow up to see if the Eigen fix was able to unblock the full GPU TensorFlow pip package build. If you’re able to full pip package, we’d be interested to know the build time improvements you’re seeing between VS 2017 and VS 2019.

Thanks,

Victor

From: Rasmus Munk Larsen <rmla...@google.com>
Sent: Tuesday, December 3, 2019 11:22 AM
To: Martin Wicke <wi...@google.com>
Cc: Victor Tong <vit...@microsoft.com>; Austin Anderson <ange...@google.com>; Goldie Gadde <gga...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Eric Brumer <eri...@microsoft.com>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Subject: Re: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019

Hi Victor,

Thanks for the investigation. I will look into upstreaming your fix to Eigen today.

Rasmus

Austin Anderson

unread,

Dec 9, 2019, 3:40:29 PM12/9/19

to Goldie Gadde, Rasmus Munk Larsen, Victor Tong, Martin Wicke, Günhan Gülsoy, Sanjoy Das, SIG Build, Eric Brumer, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

> Could you also make sure that these changes are part of the final release as well.

Just to be clear, the flag is in VS 2019 16.4, which is the latest official release as of last week. I upgraded our images to use that official release.

Victor and team: thanks a bunch for your help with this! These windows builds were a major stress point on our releases. The complete build + test pipeline seems to have gone down from 48+ hours to 4.5 hours for GPU with eigen strong inlining enabled. I don't think we added anything else to the build configuration aside from /d2ReducedOptimizeHugeFunctions; as for metrics analysis, I don't know enough to speak for it.

Goldie Gadde

unread,

Dec 10, 2019, 4:02:54 PM12/10/19

to Eric Brumer, Victor Tong, Austin Anderson, Rasmus Munk Larsen, Martin Wicke, Günhan Gülsoy, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Thanks Austin and Victor for clarifying that the release is official, that is great. We will definitely reach out to you if run into any issues.

Eric,

regarding the note/tweet let me get back to you on that.

Thanks,

Goldie

On Tue, Dec 10, 2019 at 12:55 PM Eric Brumer <eri...@microsoft.com> wrote:

Hi folks, I'm glad to hear things are working. To reiterate what Victor: said don't hesitate to reach out if there are further issues.

I'd like to ask a favor: is there some public forum where you would be comfortable mentioning the wins you saw? It doesn't have to be a formal blog post or anything, even a tweet to @visualc would be really appreciated... we'd love to hear your thoughts on ease-of-upgrade of the C++ tools, and the compile-time wins.

Thanks,

Eric

From: Victor Tong <vit...@microsoft.com>
Sent: Monday, December 9, 2019 2:38 PM
To: Austin Anderson <ange...@google.com>; Goldie Gadde <gga...@google.com>
Cc: Rasmus Munk Larsen <rmla...@google.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Eric Brumer <eri...@microsoft.com>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>
Subject: RE: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019

I’m glad to hear that TensorFlow is able to move to VS 2019. As Austin mentioned, VS 2019 16.4 is an official release (not Preview) that contains the /d2ReducedOptimizeHugeFunctions compiler flag. 16.4 is also the latest servicing baseline that will receive updates over time. See https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes for more details.

If you see any performance regressions in TensorFlow because of this (or in general), let us know and we’d be happy to look into it.

Austin Anderson

unread,

Dec 12, 2019, 5:49:33 PM12/12/19

to Goldie Gadde, Eric Brumer, Victor Tong, Rasmus Munk Larsen, Martin Wicke, Günhan Gülsoy, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Eric: I've drafted a tweet for this after discussing with the team, but I'm going to wait to confirm that there were no other issues with the upgrade. A few GitHub issues with 2.1.0-rc1, the first RC built with the upgraded images, have been popping up (most notably https://github.com/tensorflow/tensorflow/issues/35036). They might be related and we haven't tracked down the root cause quite yet.

Austin Anderson

unread,

Jan 21, 2020, 7:36:40 PM1/21/20

to Eric Brumer, Goldie Gadde, Victor Tong, Rasmus Munk Larsen, Martin Wicke, Günhan Gülsoy, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

Thanks for the ping. I just got back from vacation and I'm working on getting this through now that 2.1 has been released!

On Mon, Jan 6, 2020 at 12:51 PM Eric Brumer <eri...@microsoft.com> wrote:

Hi, happy new year everyone!

I wanted to see how things were going with the release & ping you about an update regarding the tweet to @visualc.

Thanks,

Eric

From: Austin Anderson <ange...@google.com>
Sent: Thursday, December 12, 2019 2:48 PM
To: Goldie Gadde <gga...@google.com>
Cc: Eric Brumer <eri...@microsoft.com>; Victor Tong <vit...@microsoft.com>; Rasmus Munk Larsen <rmla...@google.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>

Austin Anderson

unread,

Jan 23, 2020, 5:20:29 PM1/23/20

to Eric Brumer, Goldie Gadde, Victor Tong, Rasmus Munk Larsen, Martin Wicke, Günhan Gülsoy, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team

The tweet is out, cool! https://twitter.com/TensorFlow/status/1220466266399498240

Thanks again for your help!

Goldie Gadde

unread,

Nov 3, 2020, 5:06:32 PM11/3/20

to Victor Tong, Rasmus Munk Larsen, Austin Anderson, Martin Wicke, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team, Eric Brumer, Mihai Maruseac, Thea Lamkin

Hey Victor,

Reaching out since we are again running into very long TF build times on Windows and wanted to see if this is still the right forum to follow up on.

+Mihai Maruseac and +Thea Lamkin as well fyi.

Thanks,

Goldie

On Thu, Jan 23, 2020 at 2:36 PM Eric Brumer <eri...@microsoft.com> wrote:

Thanks very much!

From: Rasmus Munk Larsen <rmla...@google.com>

Sent: Thursday, January 23, 2020 2:26 PM
To: Austin Anderson <ange...@google.com>
Cc: Eric Brumer <eri...@microsoft.com>; Goldie Gadde <gga...@google.com>; Victor Tong <vit...@microsoft.com>; Martin Wicke <wi...@google.com>; Günhan Gülsoy <gu...@google.com>; Sanjoy Das <san...@google.com>; SIG Build <bu...@tensorflow.org>; Kirsten Lee <ki...@microsoft.com>; Rui Zhang <ruz...@microsoft.com>; Matt Gardner <Matthew...@microsoft.com>; David Hartglass <dah...@microsoft.com>; tensorflow-devinfra-team <tensorflow-d...@google.com>

Subject: Re: [EXTERNAL] Re: Building TensorFlow 2.0 with CUDA 10.0 and VS 2019

Woohoo!

victort...@gmail.com

unread,

Nov 3, 2020, 6:02:36 PM11/3/20

to SIG Build, Goldie Gadde, Rasmus Munk Larsen, Austin Anderson, Martin Wicke, Sanjoy Das, SIG Build, Kirsten Lee, Rui Zhang, Matt Gardner, David Hartglass, tensorflow-devinfra-team, Eric Brumer, mihaim...@google.com, Thea Lamkin, Victor Tong

Hi Goldie,

I'm sorry to hear that you're running into long build times. Could you follow the instructions at http://aka.ms/compilercrash to open a Developer Community ticket with details such as the compiler version, TensorFlow source branch, repro steps and an isolated repro (if possible)?

Once you create the ticket, feel free to send me a link to it. I probably won't be the person looking into it but I'll make sure it's routed to someone for investigation.