how to link to multi-threaded MKL library in gadgetron

199 views
Skip to first unread message

Sen Jia

unread,
Jul 6, 2016, 9:10:19 AM7/6/16
to Gadgetron
Hi,

Gadgetron and the Generic_Cartesian_Grappa.xml recon chain with two-step coil compression helps me a lot. Thank you very much. 

I found a problem:  my Gadgetron installation compiled with MKL run in single thread mode during GenericEigenChannelGadget, and during SolveLinearSystem_Tikhonov(A, B, x, thres); for GRAPPA/SPIRiT calibration, etc, just like without MKL. If I run the coil compression gadget and grappa gadget as standalone apps, they can call multi thread MKL, and speedup the above computations a lot (e.g. Upstream CoilCompression: from 90s to 20s, Calibration: from 120s to 30s).

cmake ../ in gadgetron folder (no modification to the source code)  reported that: 
Armadillo is found to use long long for BLAS calls

MKL is found at /opt/intel/mkl
MKL is linked against ILP64 interface ... 
-- Found MKL libraries: mkl_intel_ilp64;mkl_intel_thread;iomp5;mkl_core
-- MKL_INCLUDE_DIR: /opt/intel/mkl/include
-- MKL_LIB_DIR: /opt/intel/mkl/lib/intel64
-- MKL_COMPILER_LIB_DIR: /opt/intel/compiler/lib/intel64;/opt/intel/lib/intel64
find MKL version : 11.3.0
-- A library with BLAS API found.
-- A library with BLAS API found.
-- A library with LAPACK API found.
LAPACK Found
MKL Found, enabling MKL for mri_core gadgets.

Armadillo was installed via "sudo apt-get install libarmadillo-dev" on Ubuntu 14.04.03.  No modification to /usr/include/armadillo_bits/config.hpp except enabling long long definition.

ldd libgadgetron_mricore.so also showed that MKL was linked successfully:
libmkl_intel_ilp64.so => /opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.so (0x00007f9f286a2000)
libmkl_intel_thread.so => /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f9f2734b000)
libmkl_core.so => /opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007f9f25a66000)
libiomp5.so => /opt/intel/lib/intel64/libiomp5.so (0x00007f9f2c48c000)

These libraries was asked by MKL according to MKL link advisor tool.


Would you please to show me how to make sure Gadgetron run with multi threading MKL library? 

Thank you.
Best wishes,
Jia Sen











Michael Hansen

unread,
Jul 6, 2016, 8:31:02 PM7/6/16
to Sen Jia, Gadgetron
This is probably related to what we discussed before. When MKL calls are made in OpenMP loops, the multi threading will be disabled in the inner loops (i.e., in the MKL calls). It is not something you can "switch on". You can however try to remove any OpenMP loops that surround those MKL calls, it may make sense for your particular application. 

Bear in mind though that while some operations may be sped up by such fine tuning, it may come at the expense of other parts of the code. So the end to end time may not be any faster or it may in fact be slower, it depends on the application. 


--
You received this message because you are subscribed to the Google Groups "Gadgetron" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gadgetron+...@googlegroups.com.
To post to this group, send email to gadg...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gadgetron/9f3d2b36-e306-4eea-af00-07a67f72647f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sen Jia

unread,
Jul 7, 2016, 1:35:38 AM7/7/16
to Gadgetron, jiaca...@gmail.com
Thank you for your answer. I understand that calling multi-threaded MKL in openmp loop should be avoided. But my problem seems to be induced by my wrong installation:

I did two fresh installations of Gadgetron 3.12.0 with Intel MKL 11.3.0 on two workstation (same CPUs, E5-2660-v3) with Ubuntu 14.04. Then I run a 3D data set (352x352x240x32) with GT_3DT_Cartesian.xml (coil compression to 8). One workstation run 7 min, while the other one run only 1.5 min. 
Timing:
Coil Compression:    4s vs 111s
Calibration :           0.2s vs 3.5s (calibration data size: 10*24*30*8)
Unwrapping:            32s vs 107s
CSM estimation:      45s vs 193s

This comparison showed that MKL could speed up Gadgetron significantly. The slow one was induced by multi-threaded MKL was not called by Gadgetron (still use Lapack even MKL was installed). But I could not figure out where I made wrongly. I checked the environment variables, linked libraries via ldd command. Both were the same, except the running speed. 

More silly was I used the slow installation for several months. I will continue to find where I am wrong, and glad to know Gadgetron can run much faster than I thought.

Thank you.

Sen Jia

unread,
Jul 9, 2016, 11:54:30 PM7/9/16
to Gadgetron, jiaca...@gmail.com
Hi,

The linear algebra functions in hoNDArray_linalg.cpp such as matrix multiplication can be linked to LAPACK or faster Intel's MKL. I found that Gadgetron might still use LAPACK  if liblacpack.so and libblas.so were linked to the target library (e.g. libgadgetron_toolbox_cpucore_math.so ) with libmkl_core.so and libmkl_intel_thread.so simultaneously.

To make the linear algebra functions call MKL instead of LAPACK, I tried to comment #define ARMA_USE_WRAPPER to suppress armadillo to use its own wrapper for linking to BLAS and LAPACK. Then the gadgetron can call MKL successfully. I am not sure whether this is the reasonable solution to the link problem. 

My previous questions about "how much MKL can improve the speed of Gadgetron (especially Generic_Cartesian_Grappa.xml)":
The KLT transform in GenericReconEigenChannelGadget, the SolveLinearSystem_Tikhonov() for calibration in GenericReconGrappaGadget can be accelerated significantly via MKL. Even the CSM estimation which has been paralleled via OpenMP can be accelerated via MKL. 

Thank you very much for helping me.
Best wishes,
Jia Sen
Reply all
Reply to author
Forward
0 new messages