A faster conv2(A,B)

Jim

unread,

Sep 27, 2012, 1:19:11 PM9/27/12

to juli...@googlegroups.com

For those who must compute large 2d convolutions try the following : Data=ifft2(fft2(Data1).*fft2(Data2)).

Data1 & Data2 are 2d arrays of 1000x1000 float64 data points. This implementation is about 10 times faster than conv2(Data1,Data2). In my test case I get 0.202 Seconds vs 2.59 seconds on a iMac quad I5 using a single core.

I do not know why this is true, since i thought conv2 is using a FFT2d algorithm.

Elliot Saba

unread,

Sep 27, 2012, 1:42:37 PM9/27/12

to juli...@googlegroups.com

Please note that this is doing a circular convolution, not a linear convolution. You can fake a linear convolution using a circular convolution by zero-padding the sequences to be long enough for the "tail" that is created by a convolution to not wrap around to the beginning of the sequence. This zero padding is exactly what conv2 does, and may be why Jim is seeing a slowdown using conv2.

As a general rule of thumb, linear convolutions should increase the length of your data. If they're not, you're either truncating or wrapping around via a circular convolution. :)

-E

--

Stefan Karpinski

unread,

Sep 27, 2012, 2:59:25 PM9/27/12

to juli...@googlegroups.com

Does that mean that Jim's faster version isn't strictly correct?

--

Tim Holy

unread,

Sep 27, 2012, 4:25:08 PM9/27/12

to juli...@googlegroups.com

On Thursday, September 27, 2012 02:59:25 PM Stefan Karpinski wrote:
> Does that mean that Jim's faster version isn't strictly correct?

Depends on what you want. If you want your finite domain to be a chunk out of
infinite space, then yes, Jim's version will give you the wrong answer. If
instead you intend your finite domain to represent the "pacman" world where
going off the screen on the right immediately takes you to the left side, then
Jim's version is exactly what you want.

--Tim

>
> On Thu, Sep 27, 2012 at 1:42 PM, Elliot Saba <stati...@gmail.com> wrote:
> > Please note that this is doing a circular

> > convolution<http://en.wikipedia.org/wiki/Circular_convolution>, not a
> > linear convolution <http://en.wikipedia.org/wiki/Linear_convolution>.>

> > You can fake a linear convolution using a circular convolution by
> >
> > zero-padding the sequences to be long enough for the "tail" that is
> > created
> > by a convolution to not wrap around to the beginning of the sequence.
> > This
> > zero padding is exactly what conv2

> > does<https://github.com/JuliaLang/julia/blob/master/base/DSP.jl#L86>, and

Elliot Saba

unread,

Sep 27, 2012, 5:06:12 PM9/27/12

to juli...@googlegroups.com

The performance difference is because a convolution with two rather large matrices needs to happen on a much larger playing field for there to be no wrapping around. The matrices need to be extended to be almost 4 times larger than they are in the circular case, so in the end with the two fft's and the one ifft all operating on matrices 4 times larger, (and knowing that the fft routine's runtime is O(n*log(n))) the runtime should be somewhere around 4*log(2) == 8 times longer. Taking a look at the runtimes on my machine:

Circular version: 0.3444340229034424 seconds

Linear version: 2.947329044342041 seconds

This looks just about right. In closing, I would say you really need to be sure that you want circular convolution, as this is one of those things like aliasing where you may not realize how it invalidates a lot of assumptions later on, and in practice where you don't want to deal with the "tail" of the convolution, it's just as common to truncate the tail rather than have it wrap back around to the beginning of the signal.

-E

--

Jim

unread,

Sep 28, 2012, 9:20:19 AM9/28/12

to juli...@googlegroups.com

I agree, however the same effect can be achieved by increasing the sample rate. This results in a trade-off of speed vs fidelity.

Tim Holy

unread,

Sep 28, 2012, 12:24:54 PM9/28/12

to juli...@googlegroups.com

On Friday, September 28, 2012 06:20:19 AM Jim wrote:
> I agree, however the same effect can be achieved by increasing the sample
> rate. This results in a trade-off of speed vs fidelity.

Maybe we're talking about different things, but to clarify possible confusion:
increasing the sample rate won't solve the boundary condition issue that
Elliot first raised. (It will change the amount of data you have to process,
and increasing the rate will generally slow down the computation.) There's a
nice illustration on the page about circular convolution that Elliot linked
to: in the 4th and 5th panels of the figure, note that the red bit on the right
edge wraps around to the left, and the summation (6th panel) naturally
includes both. If you don't want that to happen, you have to pad.

However, if you're really interested in speed, AND you know that the support
of one of the two items that you're convolving with is limited (perhaps to a
small percentage of the total area), THEN you don't actually need full two-
fold padding. All you need to pad by is the size of the support.

Finally, a treat for those interested in speed, FFTs, and padding: FFT
algorithms are most efficient---sometimes dramatically so---when the domain size
is a product of small primes. I wrote "nextprod" (Julia bonus! Not present in
most, or perhaps any, other languages!) to compute the smallest integer
expressible as a product of given primes, greater than or equal to a given
size. For example:

julia> n = 1973 # a not-small prime number, FFTs will be slow
1973

julia> z = randn(n);

julia> @time for i = 1:1000; fft(z); end
elapsed time: 0.4927511215209961 seconds

julia> @time for i = 1:1000; fft(z); end
elapsed time: 0.503350019454956 seconds

julia> p = nextprod([2,3,5,7], n) # next int of form 2^a*3^b*5^c*7^d
2000

julia> z = randn(p);

julia> @time for i = 1:1000; fft(z); end
elapsed time: 0.13319087028503418 seconds

julia> @time for i = 1:1000; fft(z); end
elapsed time: 0.12734603881835938 seconds

If you can be flexible about your padding size, I highly recommend using it.
With [2,3,5,7] the ratio of p/n never gets more than a few percent above 1, so
it barely increases your storage requirements. That's why this strategy beats
the old way of just using powers of 2, which can have really bad consequences
for 3-dimensional FFTs.

Perhaps conv2 should be rewritten to use it...

--Tim

Elliot Saba

unread,

Sep 28, 2012, 12:39:11 PM9/28/12

to juli...@googlegroups.com

That's a great idea, Tim. Best part is that it can be completely transparent to the user, as output size can be truncated back to normal size.

--Tim

--

Reply all

Reply to author

Forward