Getting mkl FFT to run faster than scipy.fftpack

0 views
Skip to first unread message

secacpy...@gmail.com

unread,
Nov 21, 2015, 1:41:05 PM11/21/15
to Anaconda - Public
Hi all, I am trying to exploit the goodness of the fft code in the MKL.

It seems slower than the fftpack in scipy though.

My code is below but for a 2D 2048x2048 FFT my benchmarks are:

mklfft: 0.330668347953
scipy: 0.186936112343
numpy: 0.331851455425
fftw fftpack-compat: 0.018929695108

mkl is slower than scipy and much, much slower than FFTW. Does anyone
know how to get the mkl fft chuffing as fast or faster than scipy?



My benchmark code:

import timeit
import numpy as np

setup = 'import mklfft; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'
setup += 'import mkl; mkl.set_num_threads(4)\n'
cmd = 'b = mklfft.fftpack.fft2(a)'

print('mklfft: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

setup = 'import scipy.fftpack; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'
cmd = 'b = scipy.fftpack.fft2(a)'

print('scipy: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=5))))

setup = 'import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'
cmd = 'b = np.fft.fft2(a)'

print('numpy: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

setup = 'import pyfftw.interfaces.scipy_fftpack as fftpack; import numpy as np; a = np.ones((2048, 2048), dtype=np.complex64)\n'
setup += 'import pyfftw.interfaces.cache as cache; cache.enable()\n'
cmd = "b = fftpack.fft2(a, planner_effort='FFTW_MEASURE', threads=4)"
setup += cmd

print('fftw fftpack-compat: ' + str(np.mean(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

evgueni....@gmail.com

unread,
Nov 23, 2015, 9:01:31 AM11/23/15
to Anaconda - Public, secacpy...@gmail.com
Here is a smallish example of how to do it.
To get more stable FFTs timings, it is better to separate setup and compute, and affinitize the OMP threads doing export 'KMP_AFFINITY=compact' before invoking python.
And of course, I would recommend reading https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl which has been updated Sep 2015.

import numpy as _np
import ctypes as _ctypes
import timeit
setup = 'import numpy as _np\n'
setup += 'import ctypes as _ctypes\n'
setup += 'a = _np.ones((2048, 2048), dtype=_np.complex64)\n'
setup += 'mkl = _ctypes.cdll.LoadLibrary(\'/path/to/my/mkl/lib/intel64/libmkl_rt.so\')\n'
setup += 'mkl.MKL_Set_Num_Threads(_ctypes.c_int(4))\n'
setup += 'desc_handle = _ctypes.c_void_p(0)\n'
setup += 'dims = (_ctypes.c_long*2)(*a.shape)\n'
setup += '_DFTI_SINGLE = _ctypes.c_int(35)\n'
setup += '_DFTI_COMPLEX = _ctypes.c_int(32)\n'
setup += 'mkl.DftiCreateDescriptor(_ctypes.byref(desc_handle), _DFTI_SINGLE, _DFTI_COMPLEX, _ctypes.c_long(2), dims )\n'
setup += 'mkl.DftiCommitDescriptor(desc_handle)\n'
#setup += 'print("MKL FFT setup done\\n")\n'
cmd = 'mkl.DftiComputeForward(desc_handle, a.ctypes.data_as(_ctypes.c_void_p) )\n'
#cmd += 'print("MKL FFT compute done\\n")\n'
print('mklfft: ' + str(_np.min(timeit.repeat(cmd, setup=setup, number=1, repeat=15))))

secacpy...@gmail.com

unread,
Nov 30, 2015, 7:26:18 PM11/30/15
to Anaconda - Public, secacpy...@gmail.com, evgueni....@gmail.com
That is brilliant. Apparently the module mklfft is not really the mkl. Using the recommended code I get:
mklfft: 0.0575509690182
versus
fake mklfft: 0.534160655215
scipy: 0.533176057333
numpy: 0.546839504205
fftw fftpack-compat in MEASURE mode: 0.0824766033456

It seems the real mklfft is about 30% faster than the FFTW in measure mode on my hardware. This
makes a lot more sense.
Reply all
Reply to author
Forward
0 new messages