Getting mkl FFT to run faster than scipy.fftpack

secacpy...@gmail.com

unread,

Nov 21, 2015, 1:41:05 PM11/21/15

to Anaconda - Public

Hi all, I am trying to exploit the goodness of the fft code in the MKL.

It seems slower than the fftpack in scipy though.

My code is below but for a 2D 2048x2048 FFT my benchmarks are:

mklfft: 0.330668347953

scipy: 0.186936112343

numpy: 0.331851455425

fftw fftpack-compat: 0.018929695108

mkl is slower than scipy and much, much slower than FFTW. Does anyone

know how to get the mkl fft chuffing as fast or faster than scipy?

My benchmark code:

import timeit

import numpy as np

setup = 'import mklfft; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'

setup += 'import mkl; mkl.set_num_threads(4)\n'

cmd = 'b = mklfft.fftpack.fft2(a)'

print('mklfft: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

setup = 'import scipy.fftpack; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'

cmd = 'b = scipy.fftpack.fft2(a)'

print('scipy: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=5))))

setup = 'import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'

cmd = 'b = np.fft.fft2(a)'

print('numpy: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

setup = 'import pyfftw.interfaces.scipy_fftpack as fftpack; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'

setup += 'import pyfftw.interfaces.cache as cache; cache.enable()\n'

cmd = "b = fftpack.fft2(a, planner_effort='FFTW_MEASURE', threads=4)"

setup += cmd

print('fftw fftpack-compat: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

evgueni....@gmail.com

unread,

Nov 23, 2015, 9:01:31 AM11/23/15

to Anaconda - Public, secacpy...@gmail.com

Here is a smallish example of how to do it.

To get more stable FFTs timings, it is better to separate setup and compute, and affinitize the OMP threads doing export 'KMP_AFFINITY=compact' before invoking python.

And of course, I would recommend reading https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl which has been updated Sep 2015.

import numpy as _np
import ctypes as _ctypes
import timeit

setup = 'import numpy as _np\n'
setup += 'import ctypes as _ctypes\n'
setup += 'a = _np.ones((2048, 2048), dtype=_np.complex64)\n'
setup += 'mkl = _ctypes.cdll.LoadLibrary(\'/path/to/my/mkl/lib/intel64/libmkl_rt.so\')\n'
setup += 'mkl.MKL_Set_Num_Threads(_ctypes.c_int(4))\n'
setup += 'desc_handle = _ctypes.c_void_p(0)\n'
setup += 'dims = (_ctypes.c_long*2)(*a.shape)\n'
setup += '_DFTI_SINGLE = _ctypes.c_int(35)\n'
setup += '_DFTI_COMPLEX = _ctypes.c_int(32)\n'
setup += 'mkl.DftiCreateDescriptor(_ctypes.byref(desc_handle), _DFTI_SINGLE, _DFTI_COMPLEX, _ctypes.c_long(2), dims )\n'
setup += 'mkl.DftiCommitDescriptor(desc_handle)\n'
#setup += 'print("MKL FFT setup done\\n")\n'
cmd = 'mkl.DftiComputeForward(desc_handle, a.ctypes.data_as(_ctypes.c_void_p) )\n'
#cmd += 'print("MKL FFT compute done\\n")\n'

print('mklfft: ' + str(_np.min(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

secacpy...@gmail.com

unread,

Nov 30, 2015, 7:26:18 PM11/30/15

to Anaconda - Public, secacpy...@gmail.com, evgueni....@gmail.com

That is brilliant. Apparently the module mklfft is not really the mkl. Using the recommended code I get:

mklfft: 0.0575509690182

versus

fake mklfft: 0.534160655215

scipy: 0.533176057333

numpy: 0.546839504205

fftw fftpack-compat in MEASURE mode: 0.0824766033456

It seems the real mklfft is about 30% faster than the FFTW in measure mode on my hardware. This

makes a lot more sense.

Reply all

Reply to author

Forward