inplace FFT and documentation

126 views
Skip to first unread message

digg...@gmail.com

unread,
Sep 28, 2015, 4:29:51 PM9/28/15
to reikna
Hi,

I was wondering if I used the same arrays in the FFT compiled signature for input and output like this:
fft(arr,arr,inverse=False)
does Reikna create a temporary array for the calculation or is this an inplace FFT?

Also, I noticed a little mistake in the documentation of the inverse flag:
'inverse – a scalar value castable to integer. If 1, output contains the forward FFT of input, if 0 the inverse one.'
Should be the other way around, right?

Bogdan Opanchuk

unread,
Sep 28, 2015, 8:16:41 PM9/28/15
to reikna
Hi,

> I was wondering if I used the same arrays in the FFT compiled signature for input and output like this:
> fft(arr,arr,inverse=False)
> does Reikna create a temporary array for the calculation or is this an inplace FFT?

Depends on the number of axes you're transforming, and their size. As a general rule, small (<=1024 problem size) 1D FFTs over the rightmost axis are performed inplace, the rest will need some temporary buffers.

> Also, I noticed a little mistake in the documentation of the inverse flag:
> 'inverse – a scalar value castable to integer. If 1, output contains the forward FFT of input, if 0 the inverse one.'
> Should be the other way around, right?

Yes, you're right, good catch. No idea how it survived that long. Fixed.

digg...@gmail.com

unread,
Sep 29, 2015, 10:00:08 AM9/29/15
to reikna
Okay, thanks for the reply.
My arrays are in the range of 4096x4096 complex128 and up. Is there a way to force inplace calculation? Would save me a lot of mem...

Bogdan Opanchuk

unread,
Sep 29, 2015, 8:48:48 PM9/29/15
to reikna
It's rather a limitation of the current algorithm. You cannot do an FFT over non-rightmost axis unless you completely ignore memory coalescing (and then you'll get abysmal performance). It is possible to do an inplace 4096 size FFT over the rightmost axis if you have enough registers and local memory on the device (not sure how the most recent cards are doing, on my pretty old Tesla I think I can only go as far as 2048 for single precision). But even if that's the case, you will still have to do a transpose to do the FFT over another axis, and that will require a temporary buffer.

By the way, if it makes it any better, Reikna tries to pack temporary buffers used by all the computations compiled for a single Thread (some info on that at http://reikna.publicfields.net/en/latest/api/cluda.html#temporary-arrays), so you probably won't need several 4096x4096 buffers — just one.

Bogdan Opanchuk

unread,
Sep 29, 2015, 8:49:34 PM9/29/15
to reikna
> You cannot do an FFT over non-rightmost axis ...

*You cannot do an _inplace_ FFT over non-rightmost axis

digg...@gmail.com

unread,
Sep 30, 2015, 7:35:40 AM9/30/15
to reikna
I see. I'm using pyopencl's ImmediateAllocator memory pool when creating arrays on the device. So I guess allocating and releasing temporary buffers is managed by that one for reikna computations, too.
Well, time to get bigger hardware...

Thanks for your help.

Reply all
Reply to author
Forward
0 new messages