Hi All,
I spoke on Tuesday about a perf regression we are seeing when benchmarking 2.3 + Cuda 11.
As someone mentioned on the call to verify the cudnn version is updated. After updating to 8.0.4 from 8.0.2 there is a noticeable improvement, but its still lacking.
The current setup is in AWS using a p3.16xlarge (8 Voltas), using the tensorflow benchmarking repo.
git clone --single-branch --branch cnn_tf_v2.1_compatible https://github.com/tensorflow/benchmarks.git
python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --batch_size=256 --model=resnet50_v1.5 --optimizer=momentum --variable_update=replicated --nodistortions --gradient_repacking=2 --num_gpus=8 --num_epochs=8 --weight_decay=1e-4 --use_fp16 --all_reduce_spec=nccl --save_summaries_steps=0 --summary_verbosity=1 --num_warmup_batches=0 --train_dir=$HOME/test00 --compute_lr_on_cpu=True --single_l2_loss_op=True —loss_type_to_report=base_loss —data_name=imagenet
Tensorflow Pip version | Cuda version | cudnn version | Images/sec |
tf-nightly (2.4.0-dev20201007) | 11.0.3 | 8.0.4 | 8635.73 |
tf-2.3 | 11.0.3 | 8.0.4 | 8692.29 |
tf-2.3 | 10.1 | 7.6.5 | 8871.96 |
tf-nightly + NGC variables | 11.0.3 | 8.0.4 | 8950.92 |
tf-2.3 + NGC variables | 11.0.3 | 8.0.4 | 8995.89 |
export TF_ADJUST_HUE_FUSED=1 TF_AUTOTUNE_THRESHOLD=2 TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT=1 TF_ADJUST_SATURATION_FUSED=1 TF_ENABLE_WINOGRAD_NONFUSED=1 CUDA_CACHE_DISABLE=1
The environment variables were taken from the NGC containers:https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow
Should these be default values somewhere in TF?
Thanks,
Samuel Oshin
--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.