Is it possible to have multiple strides, dists in fftw3

Kamaraju Kusumanchi

unread,

Aug 4, 2004, 2:06:53 PM8/4/04

to

Hi all
I am using fftw3 library to evaluate the fourier transform and I am
using fortran 90 as my programming language.

Let's say I have a 3d array declared by arr(0:63, 0:127, 0:127).

let the indices be denoted by i,j,k.

I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
consists of 128 elements. The total number of such arrays are 64 * 128]

What is most efficient way to create the plans using fftw3?

I tried using the advanced interface with a stride of 128 and dist of 1.
But this will work only for k=0 and will not work for further values of k?

Another possibility is to have array of plans holding all the 128 plans
(for k=0 to 127). But that seems to be rediculous way of doing it.

Another way is to create temporary variable whose size is (0:64,0:127)
and create fftw plan on this and copy each plane of data to the
temporary variable. But this will be less efficient.

Any other ideas? I am more concerned about the efficiency than amount of
coding involved.

thanks in advance
raju

James Van Buskirk

unread,

Aug 4, 2004, 5:22:54 PM8/4/04

to

"Kamaraju Kusumanchi" <kk...@cornell.edu> wrote in message
news:cer8g1$8pa$1...@news01.cit.cornell.edu...

> I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
> consists of 128 elements. The total number of such arrays are 64 * 128]

> What is most efficient way to create the plans using fftw3?

> I tried using the advanced interface with a stride of 128 and dist of 1.
> But this will work only for k=0 and will not work for further values of k?

If you passed arr(i,0,k), the stride should be 64 because Fortran is
column major. If you passed arr(i,:,k), the stride should be 1 because
FFTW is hardly going to supply an assumed-shape dummy.

If you want efficiency, you are probably best off to declare

sub_arr(0:127,0:3)

and then copy:

forall(j = 0:127, L = 0:3) sub_arr(j,L) = arr(4*i+L,j,k)

and then do an FFT on each sub_arr(:,L). This would do the
difficult copy stuff in such a way that whole cache lines
could be copied at a time and set up easy FFTs on contiguous
data. BTW, if you do reduce the problem to 1-d FFTs like this
you might want to try using the code generator on my website:
http://home.comcast.net/~kmbtib/index.html to generate an
order-128 DFT. The generator is called scaled2g.f90. At
least doing everything in Fortran permits one to avoid
questions of C++ interoperability.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Kamaraju Kusumanchi

unread,

Aug 4, 2004, 6:08:18 PM8/4/04

to

James Van Buskirk wrote:
> "Kamaraju Kusumanchi" <kk...@cornell.edu> wrote in message
> news:cer8g1$8pa$1...@news01.cit.cornell.edu...
>
>
>>I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
>>consists of 128 elements. The total number of such arrays are 64 * 128]
>
>
>>What is most efficient way to create the plans using fftw3?
>
>
>>I tried using the advanced interface with a stride of 128 and dist of 1.
>>But this will work only for k=0 and will not work for further values of k?
>
>
> If you passed arr(i,0,k), the stride should be 64 because Fortran is
> column major. If you passed arr(i,:,k), the stride should be 1 because
> FFTW is hardly going to supply an assumed-shape dummy.
>

Sorry My mistake. I meant 64 to be the stride instead of 128. If the
advanced interface is used with stride = 64, dist = 1 alone then it
works only for k=0 plane and does not work for other planes.

Instead if the advanced interface comes with an option to specify
multiple strides or multiple dist variables it would be much easy.

Moreover, I am observing a performance improvement of a factor of 3 if
advance interface is used instead of calling the simple interface
multiple times. So I would like to use the advanced interface as much as
possible.

thanks
raju

Steven G. Johnson

unread,

Aug 5, 2004, 11:44:10 PM8/5/04

to

Kamaraju Kusumanchi <kk...@cornell.edu> wrote...

> Let's say I have a 3d array declared by arr(0:63, 0:127, 0:127).
>
> let the indices be denoted by i,j,k.
>
> I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
> consists of 128 elements. The total number of such arrays are 64 * 128]
>
> What is most efficient way to create the plans using fftw3?

You don't want to use the advanced interface for this, you really want
to use the FFTW "guru" interface. The reason is that you want *two*
nested loops of 1d DFTs, and only the guru interface in FFTW allows
you to plan multiple loops of transforms.

That is, you create a plan with:

transform rank = 1: n=128, stride=64

howmany rank = 2: (i.e. 2 nested loops of transforms)
n = 64, stride = 1
n = 128, stride = 64 * 128

Cordially,
Steven G. Johnson

PS. With regards to James' suggestion in this thread, FFTW should
automatically copy a few size-128 arrays at a time to a contiguous
buffer before transforming if it is advantageous to do so.

Kamaraju Kusumanchi

unread,

Aug 6, 2004, 1:47:46 PM8/6/04

to

Steven G. Johnson wrote:
> Kamaraju Kusumanchi <kk...@cornell.edu> wrote...
>
>>Let's say I have a 3d array declared by arr(0:63, 0:127, 0:127).
>>
>>let the indices be denoted by i,j,k.
>>
>>I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
>>consists of 128 elements. The total number of such arrays are 64 * 128]
>>
>>What is most efficient way to create the plans using fftw3?
>
>
> You don't want to use the advanced interface for this, you really want
> to use the FFTW "guru" interface. The reason is that you want *two*
> nested loops of 1d DFTs, and only the guru interface in FFTW allows
> you to plan multiple loops of transforms.
>
> That is, you create a plan with:
>
> transform rank = 1: n=128, stride=64
>
> howmany rank = 2: (i.e. 2 nested loops of transforms)
> n = 64, stride = 1
> n = 128, stride = 64 * 128
>

I presume the last two lines corresponded to *howmany_dims (the 4th
argument in the call to fftw_plan_guru_dft function). Following your
suggestion, howmany_dims should be an array of two fftw_iodim elements.

But the manual says it should be an array with 'rank' number of elements
(pg33, section 4.5.2, 2nd line from top). How is that possible? Where am
I going wrong? Is there any typo that I am unaware of?

thanks
raju

Kamaraju Kusumanchi

unread,

Aug 6, 2004, 4:23:53 PM8/6/04

to

Steven G. Johnson wrote:
> Kamaraju Kusumanchi <kk...@cornell.edu> wrote...
>
>>Let's say I have a 3d array declared by arr(0:63, 0:127, 0:127).
>>
>>let the indices be denoted by i,j,k.
>>
>>I want to take FFT of arrays formed by arr(i,0:127,k) [Each array
>>consists of 128 elements. The total number of such arrays are 64 * 128]
>>
>>What is most efficient way to create the plans using fftw3?
>
>

> That is, you create a plan with:
>
> transform rank = 1: n=128, stride=64
>
> howmany rank = 2: (i.e. 2 nested loops of transforms)
> n = 64, stride = 1
> n = 128, stride = 64 * 128

Thanks for the advice. Following the above suggestion, I wrote the
following code to solve the above problems. The code gives a
segmentation fault while creating the plan. Could someone enlighten me
as to where I am going wrong.

thanks in advance
raju

call_guru.f90

Steven G. Johnson

unread,

Aug 7, 2004, 1:27:47 PM8/7/04

to

Kamaraju Kusumanchi <kk...@cornell.edu> wrote...

> I presume the last two lines corresponded to *howmany_dims (the 4th
> argument in the call to fftw_plan_guru_dft function). Following your
> suggestion, howmany_dims should be an array of two fftw_iodim elements.
>
> But the manual says it should be an array with 'rank' number of elements
> (pg33, section 4.5.2, 2nd line from top). How is that possible? Where am
> I going wrong? Is there any typo that I am unaware of?

Whoops! That was a typo, howmany_dims should have howmany_rank
elements, of course. It was actually corrected in the HTML version of
the manual (http://fftw.org/fftw3_doc/Guru-vector-and-transform-sizes.html),
but I forgot to post an updated PDF as well.

I'll update the PDF manual shortly, thanks.

Cordially,
Steven G. Johnson

Steven G. Johnson

unread,

Aug 7, 2004, 1:37:12 PM8/7/04

to

Kamaraju Kusumanchi <kk...@cornell.edu> wrote...
> !fftw_plan fftw_plan_guru_dft(
> ! int rank, const fftw_iodim *dims,
> ! int howmany_rank, const fftw_iodim *howmany_dims,
> ! fftw_complex *in, fftw_complex *out,
> ! int sign, unsigned flags);
>
> call dfftw_plan_guru_dft__( plan_pencils, 1, NY, NX, NX, &
> 2, NX, 1, 1, NZ, NX*NY, NX*NY, &
> cmp_input, cmp_input, FFTW_FORWARD, FFTW_PATIENT)

This is not correct. As described in the "Calling FFTW from Fortran"
section of the FFTW manual (section 6.1,
http://fftw.org/fftw3_doc/Fortran-interface-routines.html):

"Since Fortran 77 does not have data structures, the fftw_iodim
structure from the guru interface (see Guru vector and transform
sizes) must be split into separate arguments. In particular, any
fftw_iodim array arguments in the C guru interface become three
integer array arguments (n, is, and os) in the Fortran guru interface,
all of whose length should be equal to the corresponding rank
argument."

So, you'll need something like:

integer howmany_n(2), howmany_stride(2)

howmany_n(1) = NX
howmany_stride(1) = 1
howmany_n(2) = NZ
howmany_stride(2) = NX * NY

call dfftw_plan_guru_dft(plan_pencils, 1, NY,NX,NX, 2,
howmany_n,howmany_stride,howmany_stride, cmp_input, cmp_output,
FFTW_FORWARD, FFTW_PATIENT)

Steven

Kamaraju Kusumanchi

unread,

Aug 10, 2004, 12:03:07 PM8/10/04

to

That's it. Thanks for explaining it so clearly.Now it works like a charm.

raju