I will have to compute several FFTs, always with the same shape and dtype.
Thus, I am wondering which function calls should be done once and for all to save some computation time.
I am also wondering what is the best way to chain operation in Reikna (e.g. fftshift(fft(X))).
Here is my code:
from reikna.cluda import dtypes, cuda_api
import reikna.fft as cudaFFT
from reikna.core import Annotation, Type, Transformation, Parameter
import time
import scipy.misc as misc
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')
# Pick the CUDA GPGPU API (OpenCL not available on this computer)
# and make a Thread on it.
api = cuda_api()
thr = api.Thread.create()
# host input
input_im = misc.imresize(misc.lena(), size=(2048, 2048)).astype(np.complex64)
t = time.time()
# device input
input_dev = thr.to_device(input_im)
# prepare output array
output_complex = thr.array(input_im.shape, dtype=np.complex64)
fft = cudaFFT.FFT(input_dev)
fft_compiled = fft.compile(thr)
fft_compiled(output_complex, input_dev, inverse=0)
result = output_complex.get()
print('%f' % (time.time() - t) )
t = time.time()
reference = np.fft.fft2(input_im)
print('%f' % (time.time() - t) )
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.imshow(np.log(np.absolute(result)))
ax2.imshow(np.log(np.absolute(reference)))
# release the thread to be able to launch another computation
thr.release()
assert np.linalg.norm(result - reference) / np.linalg.norm(reference) < 1e-6
Thanks!
Florian
t = time.time()
input_dev = thr.to_device(input_im)
fft_compiled(output_complex, input_dev, inverse=0)
result = output_complex.get()
print('%f' % (time.time() - t) )
t = time.time()
thr.synchronize() # to finalize whatever operations were happening before
fft_compiled(output_complex, input_dev, inverse=0)
thr.synchronize() # wait for `fft_compiled` to finish
print('%f' % (time.time() - t) )
temp = thr.array(...)
fft(temp, input)
fftshift(output, temp)
Indeed, misc.lena() returns a gray level image.