I have an application where I need to do many ffts with tight time constraints. When I was only using one size, everything ran great and very quickly. When I started changing sizes between iterations, things would run great for several iterations but then the time would ramp up. Sometimes it would increase by a factor of 10, sometimes it would double or triple. It depended upon the sizes and the number of operations I had it complete. I moved to making separate arrays for each size, but still had issues.
What I noticed after some tinkering is that this only happens with the inplace transforms (fftinplace, ifftinplace). It also held true if I used out of place transforms but stored the transformed data in the same place as the original data (i.e. data = af::fft(data)). Is this a bug, or is there something I'm missing that is causing this?
Here is some simple code to demonstrate the issue.
Output of af::info
ArrayFire v3.1.3 (CUDA, 64-bit Linux, build 35c89f5)
Platform: CUDA Toolkit 7.5, Driver: 352.55
[0] GeForce GTX 680, 2043 MB, CUDA Compute 3.0
Source code:
#include <arrayfire.h>
int main()
{
af::info();
size_t numIter = 1000;
af::timer start, start2;
float stop[numIter], stop2[numIter];
af::array complexArray, complexArray2;
for(size_t i = 0; i < numIter; i++)
{
complexArray = af::randu(256, 256, 16, 1, c32);
start = af::timer::start();
af::fftInPlace(complexArray);
stop[i] = af::timer::stop(start);
complexArray2 = af::randu(128, 64, 8, 1, c32);
start2 = af::timer::start();
af::fftInPlace(complexArray2);
stop2[i] = af::timer::stop(start2);
}
af::array timing1 = af::array(numIter, 1, stop);
af::array timing2 = af::array(numIter, 1, stop2);
af::print("", timing1, 6);
af::print("", timing2, 6);
}
If you don't want to filter through the times yourself, you could replace the last two lines with min and median to see the discrepancy like this:
af::print("First fft times min value:", af::min(timing1), 6);
af::print("First fft times median value:", af::median(timing1), 6);
af::print("Second fft times min value:", af::min(timing2), 6);
af::print("Second fft median value:", af::median(timing2), 6);
The output of that for me is as follows:
First fft times min value:
[1 1 1 1]
0.000024
First fft times median value:
[1 1 1 1]
0.000026
Second fft times min value:
[1 1 1 1]
0.000025
Second fft times median value:
[1 1 1 1]
0.000121
In this case, for me, the first set of times stayed pretty consistent but the second case ran at 24 micros for a while but then jumped up to 123 micros.
This is not specific to GPU (I also see this on a Tesla) or OS (I have tried both Debian and Red Hat).