CUDA 11 Perf regression

122 views
Skip to first unread message

Samuel Oshin

unread,
Oct 8, 2020, 12:31:24 AM10/8/20
to bu...@tensorflow.org

Hi All, 

 

I spoke on Tuesday about a perf regression we are seeing when benchmarking 2.3 + Cuda 11.

As someone mentioned on the call to verify the cudnn version is updated. After updating to 8.0.4 from 8.0.2 there is a noticeable improvement, but its still lacking.

 

The current setup is in AWS using a p3.16xlarge (8 Voltas), using the tensorflow benchmarking repo.

 

git clone --single-branch --branch cnn_tf_v2.1_compatible https://github.com/tensorflow/benchmarks.git
python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --batch_size=256 --model=resnet50_v1.5 --optimizer=momentum --variable_update=replicated --nodistortions --gradient_repacking=2 --num_gpus=8 --num_epochs=8 --weight_decay=1e-4 --use_fp16 --all_reduce_spec=nccl --save_summaries_steps=0 --summary_verbosity=1 --num_warmup_batches=0 --train_dir=$HOME/test00 --compute_lr_on_cpu=True --single_l2_loss_op=True —loss_type_to_report=base_loss —data_name=imagenet


Tensorflow Pip version

Cuda version

cudnn version

Images/sec

tf-nightly (2.4.0-dev20201007)

11.0.3

8.0.4

8635.73

tf-2.3

11.0.3

8.0.4

8692.29

tf-2.3

10.1

7.6.5

8871.96

tf-nightly + NGC variables

11.0.3

8.0.4

8950.92

tf-2.3 + NGC variables

11.0.3

8.0.4

8995.89


export TF_ADJUST_HUE_FUSED=1 TF_AUTOTUNE_THRESHOLD=2 TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT=1 TF_ADJUST_SATURATION_FUSED=1 TF_ENABLE_WINOGRAD_NONFUSED=1 CUDA_CACHE_DISABLE=1

The environment variables were taken from the NGC containers:https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

 

Should these be default values somewhere in TF?

 

Thanks, 

 

Samuel Oshin

Paige Bailey

unread,
Oct 8, 2020, 12:39:36 AM10/8/20
to Samuel Oshin, Sanjoy Das, bu...@tensorflow.org
+Sanjoy Das, from the TF GPU team.

--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.

Sanjoy Das

unread,
Oct 8, 2020, 2:02:23 AM10/8/20
to Paige Bailey, Nathan Luehr, Pankaj Kanwar, Zongwei Zhou, Samuel Oshin, SIG Build
Hi Samuel,

CC +Nathan Luehr +Pankaj Kanwar +Zongwei Zhou 

Can you try narrowing the speedup to one of these flags?  We did have a regression that could be worked around with TF_AUTOTUNE_THRESHOLD=2, but that should have been fixed since Oct 6 in the nightlies (i.e. we should not need TF_AUTOTUNE_THRESHOLD=2 after that PR).

Btw, a couple of the env vars are no longer applicable (though the NGC env vars could be catering to older TF versions): TF_ADJUST_HUE_FUSED (deleted), TF_ADJUST_SATURATION_FUSED (deleted), TF_ENABLE_WINOGRAD_NONFUSED (defaults to true).

Thanks!
-- Sanjoy

Samuel Oshin

unread,
Oct 8, 2020, 2:03:49 AM10/8/20
to Sanjoy Das, Paige Bailey, Nathan Luehr, Pankaj Kanwar, Zongwei Zhou, SIG Build
Sounds good. I'll narrow down the the env variables

Jason Zaman

unread,
Oct 8, 2020, 2:29:45 AM10/8/20
to Paige Bailey, Christian Sigg, Samuel Oshin, Sanjoy Das, SIG Build
Also +csigg. 

Christian & Sanjoy,
This came up at the SIG-Build meeting yesterday about some performance regressions with TF-2.3+cuda11.0 so looping y'all in. It seems enabling some env vars makes up for the performance difference so maybe we need to investigate the defaults for 2.4?

Samuel,
Did you see similar issues on T4s or pascal too? Or just tested Volta?

Also looks like CUDA 11.1 just came out (with support for the rtx30 series). I haven't tried to build against that yet, maybe that fixes some of the regressions? Has anyone managed to get a 3090 to test with? I've been trying but they're sold out everywhere :(

-- Jason

Sanjoy Das

unread,
Oct 8, 2020, 2:39:03 AM10/8/20
to Jason Zaman, Paige Bailey, Christian Sigg, Samuel Oshin, SIG Build
Hi Jason,

I've replied above, Samuel is helping us narrow down the issue.

Thanks!
-- Sanjoy

Samuel Oshin

unread,
Oct 8, 2020, 2:40:07 AM10/8/20
to Jason Zaman, Paige Bailey, Christian Sigg, Sanjoy Das, SIG Build
Unfortunately, I have only tried with Voltas.

On Wed, Oct 7, 2020 at 11:29 PM Jason Zaman <ja...@perfinion.com> wrote:

Samuel Oshin

unread,
Oct 9, 2020, 11:29:56 AM10/9/20
to Jason Zaman, Paige Bailey, Christian Sigg, Sanjoy Das, SIG Build
Hi Sanjoy,

So i ran the scan on those env vars.
The only major benefit is as you mentioned the TF_AUTOTUNE_THRESHOLD.
I also ran this on tf-nightly, unless this commit hasn't made it into this version?
2.4.0-dev20201007

Thanks,

Samuel Oshin
Reply all
Reply to author
Forward
0 new messages