CUDA 11 Perf regression

Samuel Oshin

unread,

Oct 8, 2020, 12:31:24 AM10/8/20

to bu...@tensorflow.org

Hi All,

I spoke on Tuesday about a perf regression we are seeing when benchmarking 2.3 + Cuda 11.

As someone mentioned on the call to verify the cudnn version is updated. After updating to 8.0.4 from 8.0.2 there is a noticeable improvement, but its still lacking.

The current setup is in AWS using a p3.16xlarge (8 Voltas), using the tensorflow benchmarking repo.

git clone --single-branch --branch cnn_tf_v2.1_compatible https://github.com/tensorflow/benchmarks.git
python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --batch_size=256 --model=resnet50_v1.5 --optimizer=momentum --variable_update=replicated --nodistortions --gradient_repacking=2 --num_gpus=8 --num_epochs=8 --weight_decay=1e-4 --use_fp16 --all_reduce_spec=nccl --save_summaries_steps=0 --summary_verbosity=1 --num_warmup_batches=0 --train_dir=$HOME/test00 --compute_lr_on_cpu=True --single_l2_loss_op=True —loss_type_to_report=base_loss —data_name=imagenet

Tensorflow Pip version	Cuda version	cudnn version	Images/sec
tf-nightly (2.4.0-dev20201007)	11.0.3	8.0.4	8635.73
tf-2.3	11.0.3	8.0.4	8692.29
tf-2.3	10.1	7.6.5	8871.96
tf-nightly + NGC variables	11.0.3	8.0.4	8950.92
tf-2.3 + NGC variables	11.0.3	8.0.4	8995.89

export TF_ADJUST_HUE_FUSED=1 TF_AUTOTUNE_THRESHOLD=2 TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT=1 TF_ADJUST_SATURATION_FUSED=1 TF_ENABLE_WINOGRAD_NONFUSED=1 CUDA_CACHE_DISABLE=1

The environment variables were taken from the NGC containers:https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

Should these be default values somewhere in TF?

Thanks,

Samuel Oshin

Paige Bailey

unread,

Oct 8, 2020, 12:39:36 AM10/8/20

to Samuel Oshin, Sanjoy Das, bu...@tensorflow.org

+Sanjoy Das, from the TF GPU team.

--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.

Sanjoy Das

unread,

Oct 8, 2020, 2:02:23 AM10/8/20

to Paige Bailey, Nathan Luehr, Pankaj Kanwar, Zongwei Zhou, Samuel Oshin, SIG Build

Hi Samuel,

CC +Nathan Luehr +Pankaj Kanwar +Zongwei Zhou

Can you try narrowing the speedup to one of these flags? We did have a regression that could be worked around with TF_AUTOTUNE_THRESHOLD=2, but that should have been fixed since Oct 6 in the nightlies (i.e. we should not need TF_AUTOTUNE_THRESHOLD=2 after that PR).

Btw, a couple of the env vars are no longer applicable (though the NGC env vars could be catering to older TF versions): TF_ADJUST_HUE_FUSED (deleted), TF_ADJUST_SATURATION_FUSED (deleted), TF_ENABLE_WINOGRAD_NONFUSED (defaults to true).

Thanks!
-- Sanjoy

Samuel Oshin

unread,

Oct 8, 2020, 2:03:49 AM10/8/20

to Sanjoy Das, Paige Bailey, Nathan Luehr, Pankaj Kanwar, Zongwei Zhou, SIG Build

Sounds good. I'll narrow down the the env variables

Jason Zaman

unread,

Oct 8, 2020, 2:29:45 AM10/8/20

to Paige Bailey, Christian Sigg, Samuel Oshin, Sanjoy Das, SIG Build

Also +csigg.

Christian & Sanjoy,

This came up at the SIG-Build meeting yesterday about some performance regressions with TF-2.3+cuda11.0 so looping y'all in. It seems enabling some env vars makes up for the performance difference so maybe we need to investigate the defaults for 2.4?

Samuel,

Did you see similar issues on T4s or pascal too? Or just tested Volta?

Also looks like CUDA 11.1 just came out (with support for the rtx30 series). I haven't tried to build against that yet, maybe that fixes some of the regressions? Has anyone managed to get a 3090 to test with? I've been trying but they're sold out everywhere :(

-- Jason

Sanjoy Das

unread,

Oct 8, 2020, 2:39:03 AM10/8/20

to Jason Zaman, Paige Bailey, Christian Sigg, Samuel Oshin, SIG Build

Hi Jason,

I've replied above, Samuel is helping us narrow down the issue.

Thanks!

-- Sanjoy

Samuel Oshin

unread,

Oct 8, 2020, 2:40:07 AM10/8/20

to Jason Zaman, Paige Bailey, Christian Sigg, Sanjoy Das, SIG Build

Unfortunately, I have only tried with Voltas.

On Wed, Oct 7, 2020 at 11:29 PM Jason Zaman <ja...@perfinion.com> wrote:

Samuel Oshin

unread,

Oct 9, 2020, 11:29:56 AM10/9/20

to Jason Zaman, Paige Bailey, Christian Sigg, Sanjoy Das, SIG Build

Hi Sanjoy,

So i ran the scan on those env vars.

The only major benefit is as you mentioned the TF_AUTOTUNE_THRESHOLD.

I also ran this on tf-nightly, unless this commit hasn't made it into this version?

2.4.0-dev20201007

Thanks,

Samuel Oshin

Reply all

Reply to author

Forward