Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FFTW 3 and / or another fast alternative

216 views
Skip to first unread message

Stefan Hinterstoisser

unread,
May 10, 2005, 7:15:55 PM5/10/05
to
Hi !

I am looking for a very fast 2Dimensional DFT (or FFT). Important: the
length of the image (height and width) should also be allowed to be not
of power of two! I already tried out FFTW 3.0.1 but for an image of
256x256 it needs 120 ms! That is really slow! Already although I used
the FFTW_ESTIMATE flag - does anyone of you has any experiences with
FFTW and windows (use VC++ 6.0)? What alternatives exist? I try the IPP
and the MKL of Intel - but there are still some bugs - therefore I dont
know if they are faster.
Thanx in advance,
Stefan

Steve Eddins

unread,
May 10, 2005, 8:58:32 PM5/10/05
to
----------------------------------------
Stefan Hinterstoisser <supers...@gmx.de> writes:
> Path: news.mathworks.com!newsfeed-00.mathworks.com!irazu.switch.ch!switch.ch!newsfeed00.sul.t-online.de!t-online.de!inka.de!rz.uni-karlsruhe.de!news.belwue.de!informatik.tu-muenchen.de!lrz.de!not-for-mail
> Newsgroups: sci.image.processing
> Subject: FFTW 3 and / or another fast alternative
> Date: Wed, 11 May 2005 09:15:55 +1000
> Organization: [posted via] Leibniz-Rechenzentrum, Muenchen (Germany)

MATLAB uses FFTW 3.0.1 for its FFT computations. On my laptop, a 1.5GHz
IBM Thinkpad T40, MATLAB computes a 256-by-256 FFT in 16 ms.

--
Steve Eddins

Development Manager, Image Processing Group
The MathWorks, Inc.

Stefan Hinterstoisser

unread,
May 11, 2005, 9:54:26 AM5/11/05
to
>>Hi !
>>
>>I am looking for a very fast 2Dimensional DFT (or FFT). Important: the
>>length of the image (height and width) should also be allowed to be not of
>>power of two! I already tried out FFTW 3.0.1 but for an image of 256x256 it
>>needs 120 ms! That is really slow! Already although I used the
>>FFTW_ESTIMATE flag - does anyone of you has any experiences with FFTW and
>>windows (use VC++ 6.0)? What alternatives exist? I try the IPP and the MKL
>>of Intel - but there are still some bugs - therefore I dont know if they
>>are faster.
>>Thanx in advance,
>>Stefan
>
>
> MATLAB uses FFTW 3.0.1 for its FFT computations. On my laptop, a 1.5GHz
> IBM Thinkpad T40, MATLAB computes a 256-by-256 FFT in 16 ms.

Do you mean it needed 16ms for Forward transformation only or did it
need 16ms for Forward and Backward transformation (so the FFT and the
IFFT)....?

Steve Eddins

unread,
May 11, 2005, 4:26:10 PM5/11/05
to
----------------------------------------
Stefan Hinterstoisser <supers...@gmx.de> writes:
> Path: news.mathworks.com!newsfeed-00.mathworks.com!irazu.switch.ch!switch.ch!news.belwue.de!informatik.tu-muenchen.de!lrz.de!not-for-mail
> Newsgroups: sci.image.processing
> Subject: Re: FFTW 3 and / or another fast alternative
> Date: Wed, 11 May 2005 23:54:26 +1000

> Organization: [posted via] Leibniz-Rechenzentrum, Muenchen (Germany)
>

Forward.

ste...@alum.mit.edu

unread,
May 11, 2005, 8:28:04 PM5/11/05
to
Steve, your numbers are still a bit high...

On my 2.2GHz Pentium-IV, calling FFTW directly (double precision, gcc,
SSE2), a single 256x256 real-input DFT takes only 2.1 ms.

The original poster should see questions 3.3 and 3.4 in the FFTW FAQ
(fftw.org/faq).

Steven G. Johnson

Steve Eddins

unread,
May 12, 2005, 8:56:54 AM5/12/05
to
----------------------------------------

Yes, I know. I'm still struggling to solve the preliminary problem of finite
time versus an infinite list of things to do. :-)

Martin Leese

unread,
May 12, 2005, 4:58:33 PM5/12/05
to supers...@gmx.de
ste...@alum.mit.edu wrote:
...

> The original poster should see questions 3.3 and 3.4 in the FFTW FAQ
> (fftw.org/faq).

Here they are (plus 3.8). The original poster said
they were using FFTW_ESTIMATE, so questions 3.3 and
3.8 would seen particularly appropriate.

Question 3.3. FFTW seems really slow.
You are probably recreating the plan before every transform, rather than
creating it once and reusing it for all transforms of the same size.
FFTW is designed to be used in the following way:
* First, you create a plan. This will take several seconds.
* Then, you reuse the plan many times to perform FFTs. These are fast.

If you don't need to compute many transforms and the time for the
planner is significant, you have two options. First, you can use the
FFTW_ESTIMATE option in the planner, which uses heuristics instead of
runtime measurements and produces a good plan in a short time. Second,
you can use the wisdom feature to precompute the plan; see Q3.8 `Can I
save FFTW's plans?'

Question 3.4. FFTW slows down after repeated calls.
Probably, NaNs or similar are creeping into your data, and the slowdown
is due to the resulting floating-point exceptions. For example, be aware
that repeatedly FFTing the same array is a diverging process (because
FFTW computes the unnormalized transform).

Question 3.8. Can I save FFTW's plans?
Yes. Starting with version 1.2, FFTW provides the wisdom mechanism for
saving plans; see the FFTW manual.

--
Regards,
Martin Leese
E-mail: ple...@see.Web.for.e-mail.INVALID
Web: http://members.tripod.com/martin_leese/

Steve Eddins

unread,
May 12, 2005, 5:17:18 PM5/12/05
to
----------------------------------------

Steven,

My big challenge in designing the MATLAB - FFTW interface is directly related to
FFTW FAQ answer 3.3. "FFTW is designed to be used in the following way: First,
you create a plan. This will take several seconds. Then, you reuse the plan many


times to perform FFTs. These are fast."

This model simply doesn't match very well what many MATLAB users typically do
and expect. This mismatch presents me with a few puzzles. I'm working through
them a (reversed) bit at a time.

ste...@alum.mit.edu

unread,
May 12, 2005, 8:12:53 PM5/12/05
to
Hi Steve,

I'm afraid that neither your numbers nor the OPs are explained (to me)
by including the planning time. In FFTW_ESTIMATE mode, a 256x256
real-data FFT on my machine (2.2GHz P-IV, double precision, SSE2, gcc,
FFTW 3.0.1), still takes only 2.6ms (or 2.9ms out of place), and the
planning time in estimate mode is only 0.3ms.

In Matlab, I understand that you pay some penalty because you
internally store the data as separate real/imag arrays rather than
interleaved. However, even with real input and interleaved real/imag
output it takes only 3.15ms on my machine (in estimate mode; 2.8ms in
patient mode). (You have some additional overhead because you copy the
output to a full, redundant, 256x256 complex array, but that shouldn't
be slower than the FFT itself.)

If I disable SIMD (SSE2) instructions, the time is still 3.8ms,
although I don't know why Matlab wouldn't use these unless your CPU
isn't a P-IV.

Granted that you have a 30% slower clock speed on your machine, but I'm
just not seeing how you can get the time up to 16ms.

(As for the original poster, he is clearly doing something grossly
wrong. Perhaps he is benchmarking by repeatedly FFTing the same array,
a diverging process that leads to slow fp exceptions.)

Steven

PS. We are aware that many users, not just Matlab, are unwilling to
wait for the optimal plan; improving our heuristic algorithms is a high
priority.

Steve Eddins

unread,
May 13, 2005, 10:10:41 AM5/13/05
to
----------------------------------------

ste...@alum.mit.edu writes:
> Hi Steve,
>
> I'm afraid that neither your numbers nor the OPs are explained (to me)
> by including the planning time.

I didn't say that, nor did I mean to imply it. I simply haven't had the time so
far to pin down precisely what caused certain FFT-related performance changes
between MATLAB 6.5 and MATLAB 7. Your reported times encourage me to find the
time to track down the problems and fix them.

Thanks,

Steve

0 new messages