Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion zero padding for convolution?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
robert bristow-johnson  
View profile  
 More options Nov 20 1999, 3:00 am
Newsgroups: comp.dsp
From: robert bristow-johnson <pbj...@viconet.com>
Date: 1999/11/20
Subject: Re: zero padding for convolution?

Bill Simpson wrote:

> I want to convolve x[] and h[] using the FFT method. I want the aperiodic
> convolution.

> I have read Oppenheim Willsky & Young, p. 331, also Numerical Recipes. The
> convolution is:
> IFFT( FFT(x[]) * FFT(h[]) )
> where * is multiplication and IFFT is inverse FFT

> However I am not clear on the zero padding necessary in order to get an
> aperiodic convolution from what is essentially a periodic convolution
> method. Could someone please explain how many zeros should be tacked onto
> the end of x[] and/or h[] (assuming that their original lengths are nx and
> nh respectively)?

> I looked in the FAQ and did a deja news search for "pad" and neither
> showed anything!

that's hard to believe!  try typing in "zero padding" into deja and
limit the search to comp.dsp and you'll get 247 hits going all the way
back to the infamous "? on zero padding" thread where Eric Jacobson and
i mixed it up a little on whether or not the DFT inherently applies a
rectagular window.  there has been, over the years, _plenty_ of
discussion on zero-padding, more for it's effect on spectral display and
interpolation and resolution and not so much for fast convolution.

a few points i would make regarding fast convolution.  there are
essentially two ways (that i know of) to do it:  Overlap-Add (where you
have to zero-pad) and Overlap-Save (where you extend the signal rather
than zero-pad, saving the output samples that are not damaged by the
convolution wrapping around the "splice" and simply discarding the
output samples that do get messed up).  the reference i'm looking at is
O&S (gray book) section 8.9, pp. 548-560.

using their language, the signal x[n] is broken up into blocks of length
L, the FIR impulse response has length P, and linearly convolving a
block of x[n] with h[n] will get you a result of length L+P-1 which is
set to the length of the FFT.

        N = L + P - 1

(life will be easier for you if N is a power of 2.)

if you're doing overlap-add, each block of L samples is padded with P-1
zeros and h[n] is padded with L-1 zeros making both circular sequences N
samples long.

if you're doing overlap-save, each block of L samples is padded with P-1
samples from the next block.  i.e. it's a block of N consequetive
samples but the _hop_ is still L samples (P-1 samples are reused in the
next block).  h[n] is still zero-padded with L-1 zeros.

now, usually the length of h[n] (O&S call it 'P' in this section) is
determined by outside criteria related to the performance and design of
the FIR filter.  for computational reasons, you want it as short as
possible whether you be doin "fast convolution" or the old-fashioned
kind but, in order to get your job done well (sharpness of cutoff, or
keeping those ripples at bay), it has to be at least some known length
(playing around with the Parks-McClellan algorithm will help you decide
what P will be).  what there is left to determine is the optimal FFT
length, N.  from that you can easily compute L = N-P+1 which is your hop
size (how many samples get processed per block or frame).

the way to determine the optimal N involves first knowing accurately the
cost of your FFT, IFFT, and N complex multiplications in the frequency
domain.  i remember working on this back in my grad school days (when i
wasn't picking my nose) but many assumptions were different back then.
back in those CISC days, multiplications were so costly we virtually
ignored the cost of additions and moving numbers around.  so we
considered the cost of the FFT to be about 4*(N/2)*log2(N/2) real
multiplications whereas, now when i look at some 56K code (where it is
just about as costly to move and add numbers as it is to multiply them)
the number of instructions is
log2(N)*(3*N + 20) + 8*N + 15.  

i think, in general, when you toss in the cost of doing the freq-domain
multiplies and the overlap-adding (if that's the method you choose), you
will have a cost function (per block of processing) that looks like
log2(N)*(A*N + B) + C*N + D.  you will have to figure out what A, B, C,
and D are for your implementation of the FFT, IFFT, and the rest of the
block processing.  then you divide that cost by the number of samples
being processed per block (that number is 'L') and then choose N = 2^q,
so that that cost per sample is minimized:

        given: A, B, C, D, and P

        minimize { (log2(N)*(A*N + B) + C*N + D)/(N-P+1) }
           N
or

        minimize { (q*(A*2^q + B) + C*2^q + D)/(2^q - P + 1) }
           q

(q = log2(N) = integer)

there are tricks to make this more efficient.  like using special real
FFT (with one-sided complex result in the freq. domain).  i found it
easier instead to use the regular old complex FFT (and IFFT) to process
_two_ real blocks at a time.  one block goes into the real part and the
other goes into the imag part of the FFT input.  another trick that will
save some messing around is to use a "decimation-in-frequency" FFT
algorithm (that takes normal ordered data going in and returns
bit-reversed data coming out) and a "decimation-in-time" IFFT algorithm
(that takes bit-reversed going in and returns normal ordered output) and
do the freq domain multiplication in bit reversed order (do the FFT of
h[n] in advance leaving the result in bit reversed order).  

back in the olden days, when i threw every trick i knew at the problem,
it came out that fast-convolution was more efficient for FIRs longer
than 8 samples.  nowadays, i would expect the trade-off to be at a much
longer FIR, perhaps 40 or more samples.  this is because the cost of
multiplication, relative to the cost of addition and data movement, is
much lower now than during my young, spry daze (some of us are getting
so damn old).

--

r b-j
pbj...@viconet.com   a.k.a.   rob...@audioheads.com
                     a.k.a.   rob...@wavemechanics.com

"Don't give in to the Dark Side.  Boycott intel and microsoft."

Either that or convict the hell out of them!

Gates -> Guilty! Guilty! Guilty!


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google