fft and power of 2 for speed

Adam Chapman

unread,

Jan 26, 2009, 2:54:21 PM1/26/09

to

I heard that fft() is faster if you pad out your data with zeros so
that the length of your data is a power of 2. However, my experiment
with the code below does not seem to agree

a=rand(1,300);
t1(1:10000)=0;
t2(1:10000)=0;
for i=1:10000
tic; fft(a); t1(i)=toc;
tic; fft(a,2^nextpow2(length(a))); t2(i)=toc;
end
mean(t1)
mean(t2)

I always get mean(t2) > mean(t1)

is there a reason why?

Rune Allnor

unread,

Jan 26, 2009, 3:10:54 PM1/26/09

to

First of all don't use TIC/TOC for performance purposes.
Use CPUTIMe or better, the profiler.

Second, use one loop for each case:

N = 100000;
ts1=cputime;
for n=1:N
fft(a);
end
te1=cputime - ts1;

ts2=cputime;
for n=1:N
fft(a2^nextpow2(length(a)));
end
te2=cputime - ts2;

tm1 = te1/N;
tm2 = te2/N;

Rune

Matt

unread,

Jan 26, 2009, 3:34:01 PM1/26/09

to

Adam Chapman <adamcha...@hotmail.co.uk> wrote in message <38fab93d-e16a-4755...@i20g2000prf.googlegroups.com>...

In addition to what Rune said, it may be that the fft() operation itself is faster, but it still costs you CPU time to zero pad the array. This requires memory allocation operations.

Accelerating fft's by zero padding is ideally something you would do when you have to do multiple fft jobs in sequence. You would then pre-allocate the zero-padded array and reload the array with new data for each job.

Matt Fig

unread,

Jan 26, 2009, 3:47:01 PM1/26/09

to

Rune Allnor <all...@tele.ntnu.no> wrote in message

>
> First of all don't use TIC/TOC for performance purposes.
> Use CPUTIMe or better, the profiler.

Of course the matlab documentation says to use tic/toc instead of cputime

doc tic

The results are substantially the same in a relative sense:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
a=rand(1,300);
N = 100000;
% Use cputime:

ts1=cputime;
for n=1:N
fft(a);
end
te1=cputime - ts1;

ts2=cputime;
for n=1:N
fft(a,2^nextpow2(length(a)));
end
te2=cputime - ts2;

times = [te1 te2];
reltime = times/min(times)

% Use tic toc:
tic

for n=1:N
fft(a);
end

te1 = toc;

tic
for n=1:N
fft(a,2^nextpow2(length(a)));
end
te2 = toc;

times = [te1 te2];
reltime = times/min(times)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

reltime =

1 3.0488

reltime =

1 3.096

7KxK-p195K951R<3;4;-K.@K-%;:-/AA@kY;-R1;;/EK4B8.-<K8@-:4e1K

Matt Fig

unread,

Jan 26, 2009, 3:50:24 PM1/26/09

to

The reference to use tic/toc instead of cputime is in

doc cputime

not tic as I previously posted. Sorry.

{eZdiWW5ZVtttdtVV:^i^\e/]t#VjVbbt`iddj{ncXkVd]tZc]VdaNBZaXt

Matt

unread,

Jan 26, 2009, 3:52:01 PM1/26/09

to

"Matt " <mjacobson....@xorantech.com> wrote in message <gll6np$8fo$1...@fred.mathworks.com>...

> Adam Chapman <adamcha...@hotmail.co.uk> wrote in message <38fab93d-e16a-4755...@i20g2000prf.googlegroups.com>...
> > I heard that fft() is faster if you pad out your data with zeros so
> > that the length of your data is a power of 2. However, my experiment
> > with the code below does not seem to agree
> >
> > a=rand(1,300);
> > t1(1:10000)=0;
> > t2(1:10000)=0;
> > for i=1:10000
> > tic; fft(a); t1(i)=toc;
> > tic; fft(a,2^nextpow2(length(a))); t2(i)=toc;
> > end
> > mean(t1)
> > mean(t2)
> >
> > I always get mean(t2) > mean(t1)
> >
> > is there a reason why?

One other point. Zero-padding is favorable mainly when your initial number of points (before zero padding) is only slightly less than the next higher power of 2.

Therefore, you should really use length(a)=511 instead of length(a)=300 to see a favorable effect.

Steve Eddins

unread,

Jan 26, 2009, 4:10:38 PM1/26/09

to

Rune Allnor wrote:
> On 26 Jan, 20:54, Adam Chapman <adamchapman1...@hotmail.co.uk> wrote:
>> I heard that fft() is faster if you pad out your data with zeros so
>> that the length of your data is a power of 2. However, my experiment
>> with the code below does not seem to agree
>>
>> a=rand(1,300);
>> t1(1:10000)=0;
>> t2(1:10000)=0;
>> for i=1:10000
>> tic; fft(a); t1(i)=toc;
>> tic; fft(a,2^nextpow2(length(a))); t2(i)=toc;
>> end
>> mean(t1)
>> mean(t2)
>>
>> I always get mean(t2) > mean(t1)
>>
>> is there a reason why?
>
> First of all don't use TIC/TOC for performance purposes.
> Use CPUTIMe or better, the profiler.

Rune,

If I recall correctly, you have stated in previous posts that you are
using a version of MATLAB that is some years out of date. Our current
recommendation to users is to use tic and toc. These functions have
changed quite a bit over the last few years.

To the original poster, give my timeit function a try:

http://www.mathworks.com/matlabcentral/fileexchange/18798

---
Steve Eddins
http://blogs.mathworks.com/steve/

NZTideMan

unread,

Jan 26, 2009, 4:11:45 PM1/26/09

to

On Jan 27, 9:52 am, "Matt " <mjacobson.removet...@xorantech.com>
wrote:
> "Matt " <mjacobson.removet...@xorantech.com> wrote in message <gll6np$8f...@fred.mathworks.com>...
> > Adam Chapman <adamchapman1...@hotmail.co.uk> wrote in message <38fab93d-e16a-4755-9c95-edea98035...@i20g2000prf.googlegroups.com>...

> > > I heard that fft() is faster if you pad out your data with zeros so
> > > that the length of your data is a power of 2. However, my experiment
> > > with the code below does not seem to agree
>
> > > a=rand(1,300);
> > > t1(1:10000)=0;
> > > t2(1:10000)=0;
> > > for i=1:10000
> > > tic; fft(a); t1(i)=toc;
> > > tic; fft(a,2^nextpow2(length(a))); t2(i)=toc;
> > > end
> > > mean(t1)
> > > mean(t2)
>
> > > I always get mean(t2) > mean(t1)
>
> > > is there a reason why?
>
> One other point. Zero-padding is favorable mainly when your initial number of points (before zero padding) is only slightly less than the next higher power of 2.
>
> Therefore, you should really use length(a)=511 instead of length(a)=300 to see a favorable effect.

Also, blindly zero-padding a signal may introduce errors into the
Fourier transform. For example, if the signal has a mean of say 100,
then zero padding will introduce a step of 100 and this will result in
red noise in the frequency domain.

Steven Lord

unread,

Jan 26, 2009, 4:37:06 PM1/26/09

to

"Matt Fig" <spam...@yahoo.com> wrote in message
news:gll7g5$2a5$1...@fred.mathworks.com...

> Rune Allnor <all...@tele.ntnu.no> wrote in message
>>
>> First of all don't use TIC/TOC for performance purposes.
>> Use CPUTIMe or better, the profiler.
>
>
> Of course the matlab documentation says to use tic/toc instead of cputime
>
> doc tic
>
> The results are substantially the same in a relative sense:
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> a=rand(1,300);
> N = 100000;
> % Use cputime:
> ts1=cputime;
> for n=1:N
> fft(a);
> end
> te1=cputime - ts1;
>
> ts2=cputime;
> for n=1:N
> fft(a,2^nextpow2(length(a)));

It's a minor thing, but you probably want to compute 2^nextpow2(length(a))
once outside the loop being timed and use that value inside the loop. That
saves (N-1) [assuming you time the one call to NEXTPOW2 that you do outside
the loop] instances of function call overhead, error checking inside
NEXTPOW2, exponentiation, etc.

*snip*

--
Steve Lord
sl...@mathworks.com

Matt Fig

unread,

Jan 26, 2009, 4:50:18 PM1/26/09

to

"Steven Lord" <sl...@mathworks.com> wrote in message

> It's a minor thing, but you probably want to compute 2^nextpow2(length(a))
> once outside the loop being timed and use that value inside the loop. That
> saves (N-1) [assuming you time the one call to NEXTPOW2 that you do outside
> the loop] instances of function call overhead, error checking inside
> NEXTPOW2, exponentiation, etc.

True if the O.P. is calculating the fft of many variables that are the same length in a loop. But for comparing the raw speed of getting the fft one way versus another, I think it is fair to leave any steps that must be taken inside the timer!

JggIKggugnWA\WXMaQI\IP"TgVTXWP5W\WKIWIIJMg]M(PUVnUQS-IOgM^]

Peter Boettcher

unread,

Jan 26, 2009, 4:54:50 PM1/26/09

to

Adam Chapman <adamcha...@hotmail.co.uk> writes:

FFTW (the FFT engine used by MATLAB) knows how to do efficient
non-power-of-two FFTs. If you use 293 (a prime number) for the length,
you see the desired effect. It is still a small effect (1.5x), mostly because
these lengths are so small. The overhead is substantial. Trying this
again with larger numbers will show a larger difference. Using 293*293,
for example, gives me an almost 5x speedup to use the next largest power
of 2.

-Peter

Steven G. Johnson

unread,

Feb 1, 2009, 2:05:21 PM2/1/09

to

On Jan 26, 2:54 pm, Adam Chapman <adamchapman1...@hotmail.co.uk>
wrote:

> I heard that fft() is faster if you pad out your data with zeros so
> that the length of your data is a power of 2. However, my experiment
> with the code below does not seem to agree

The correct statement is that FFTs are faster for sizes that factorize
into small prime factors, ideally 2,3,5. Powers of 2 have a
historical advantage in that they are the most heavily optimized in
many FFT implementations, but the FFT implementation used by Matlab
(FFTW) is almost as optimized for other small prime factors.

If you have a size that is factorizable into powers of 2,3,5 (like 300
in your case), then going to the next power of 2 (512 in your case) is
likely to be slower.

If you have a size that has large prime factors, then it is usually
beneficial from a performance perspective to go to the next size that
is factorizable into {2,3,5}, not the next power of 2. (Note,
however, that even large prime sizes are handled by an N log N
algorithm in the FFT code used by Matlab.) Of course, depending upon
your application, zero-padding may cause other problems by changing
what you are computing.

Regards,
Steven G. Johnson