How is the official build using oneDNN/MKL?

561 views
Skip to first unread message

Alexander Grund

unread,
Sep 20, 2021, 4:01:08 AM9/20/21
to SIG Build
Dear SIG Build team,

we are self compiling TensorFlow on various HPC clusters due to hardware
requirements (e.g. CUDA drivers) and were using `--config=mkl` to
(supposedly) enable the use of MKL and/or oneDNN to accelerate the CPU
operations for various DNN ops.

However we are notified that our self-built package performs worse than
the pip package on CPU even though we enable more aggressive
optimizations, e.g. -march=native to make use of AVX2 etc.
Further investigation revealed a serious overusage of threads leading to
many involuntary context switches severely impacting the performance.
Those can be (mostly) mitigated by setting e.g. OMP_NUM_THREADS=1, but
we can't do that by default for all users of our cluster for obvious
reasons.

Comparing our build with the official pip packages lead to the mentioned
mkl-option which is a collective setting for these flags:
--define=build_with_mkl=true --define=enable_mkl=true
--define=tensorflow_mkldnn_contraction_kernel=0
--define=build_with_openmp=true

Searching the binaries of the pip package for the effects of those flags
makes me conclude that neither of those is used, i.e. the official pip
packages are not build with `--config=mkl`. See
https://github.com/easybuilders/easybuild-easyblocks/issues/2577#issuecomment-919914929
for a detailed analysis.

However disabling (i.e. not passing) --config=mkl makes it fail at least
1 Test: //tensorflow/core/kernels/mkl:mkl_fused_batch_norm_op_test

Only disabling the omp part, i.e. passing `--define=build_with_mkl=true
--define=enable_mkl=true
--define=tensorflow_mkldnn_contraction_kernel=0` instead makes many
tests fail:
//tensorflow/c/eager:c_api_cluster_test
//tensorflow/c/eager:c_api_remote_function_test
//tensorflow/c/eager:c_api_remote_test
//tensorflow/c/eager:c_api_test
//tensorflow/core/kernels:matmul_op_test
//tensorflow/core/kernels/mkl:mkl_fused_batch_norm_op_test
//tensorflow/python:convert_to_constants_test
//tensorflow/python/keras/layers:kernelized_test
//tensorflow/python/keras/wrappers:scikit_learn_test
//tensorflow/python/kernel_tests:variables_test
//tensorflow/python/kernel_tests/distributions:dirichlet_test

Using the related `--config=mkl_threadpool` seems to be even worse with
NaNs, segfaults, FPEs....

- So what exactly is the purpose of `build_with_mkl` and `enable_mkl`?
- How are those flags exactly related to oneDNN and MKL? I don't see the
actual MKL being using, hence the confusion.
- How are the official pip packages built? Are they tested with that
setting?

Thanks!
Alex


Penporn Koanantakool

unread,
Sep 20, 2021, 11:15:35 AM9/20/21
to Alexander Grund, Ramesh, AG, SIG Build
Hi Alexander,


Further investigation revealed a serious overusage of threads leading to
many involuntary context switches severely impacting the performance.

OpenMP threading over-/under-subscription has been a major performance issue for --config=mkl for a long time. --config=mkl_threadpool was our attempt to fix this by making oneDNN primitives use TensorFlow's thread pool for threading instead of OpenMP. 

Using the related `--config=mkl_threadpool` seems to be even worse with
NaNs, segfaults, FPEs....

The config is slightly outdated now, as we have included the custom TF-oneDNN ops (with TF thread pool) in the official TF build from TF 2.5 onwards. (You can build with just --config=opt.) These custom TF-oneDNN ops are disabled by default, but can be enabled by setting an environment variable TF_ENABLE_ONEDNN_OPTS=1. Please see this blog post for more details. We would appreciate it if you could give it a try and let us know if you have any feedback. If there are issues, please open an issue on TF github and tag @TensorFlow-MKL. 

- So what exactly is the purpose of `build_with_mkl` and `enable_mkl`?

This is to distinguish between vanilla TF code (which now builds with custom TF-oneDNN ops) and --config=mkl code which is still using OpenMP. All oneDNN-related code is guarded with `#ifdef INTEL_MKL` and all OpenMP-related code are further guarded by `#ifdef ENABLE_MKL`. `build_with_mkl` defines `INTEL_MKL`, while `enable_mkl` defines `ENABLE_MKL`. We will improve the naming to make it more self-explanatory.

- How are those flags exactly related to oneDNN and MKL? I don't see the actual MKL being using, hence the confusion.

--config=mkl used to use MKL (Math Kernel Library) for BLAS (mainly sgemm). We have completely removed MKL last year [1, 2] as oneDNN (new name of MKL-DNN) already has all we need. We plan to change the name soon. Sorry about the confusing names.

- How are the official pip packages built? Are they tested with that setting?

This folder contains our pip package build scripts. TF-oneDNN ops are disabled by default in vanilla TF, so they are not tested by us. But Intel has community CI builds (and presubmit test) that test vanilla TF with TF-oneDNN ops turned on (e.g., setting TF_ENABLE_ONEDNN_OPTS=1).

Best,
Penporn

--
To unsubscribe from this group and stop receiving emails from it, send an email to build+un...@tensorflow.org.

Alexander Grund

unread,
Sep 28, 2021, 4:45:16 AM9/28/21
to Penporn Koanantakool, Ramesh, AG, SIG Build

(You can build with just --config=opt.) These custom TF-oneDNN ops are disabled by default, but can be enabled by setting an environment variable TF_ENABLE_ONEDNN_OPTS=1. Please see this blog post for more details. We would appreciate it if you could give it a try and let us know if you have any feedback. If there are issues, please open an issue on TF github and tag @TensorFlow-MKL.
I tried and failed for TF 2.5+. See https://github.com/tensorflow/tensorflow/issues/52151


- How are the official pip packages built? Are they tested with that setting?

This folder contains our pip package build scripts. TF-oneDNN ops are disabled by default in vanilla TF, so they are not tested by us.
I see that with that test filter only python tests are run, not the C++ tests which would include that failing one I see. This is unfortunate...
Reply all
Reply to author
Forward
0 new messages