The suggestion is to do two basic things that can help a lot:
* Use DFTs and hadamard products for the convolutions
* Mix pairs of samples as complex samples. eg, given two samples A &
B, you'd produce a new sample C = A + iB
In the latter case, this allows you to convolve a single filter with
two samples, but you can decrease the total computations by a fair
amount. Although it isn't necessary to do so, you can still separate
the two samples in the frequency domain without using an inverse DFT.
If you use regular convolutions instead of using the frequency
domain, on the down passes the number of computations you need to
perform is pretty immense, whereas in the frequency domain you can sum
the results from the convolutions on the down pass before doing the
inverse DFT. A similar savings can be found when calculating the
gradient.
It's a little more involved to get the dimensions right, and you
have to use zero-padded samples to avoid doing circular convolutions
(though, some DFT libraries have built-in functionality for padding a
sample implicitly) ... but, I think it'll help a lot.
Before you ask, I don't have numbers to share. I've done the
analysis, but haven't had the time to implement it.
-Brian
And by the way, thanks both of you guys for sharing such tricks. We
should at some point in time write all these kind of tricks somewhere
for new users to see. This somehow feels like a tutorial on
convolutional RBM ... though there is need for quite a bit more work
to get it into a tutorial. If anyone wants to dedicate time to write a
tutorial is welcome ( I think we can easily accept 3rd party tutorials
:) - giving credit of course to the author).
Razvan
On Thu, Apr 21, 2011 at 11:47 AM, Brian Vandenberg <phan...@gmail.com> wrote:Can you do this with Theano right now ? That would be cool.
> I can't comment on this specific issue (using conv2d for a 1d
> convolution), but I do have some suggestions for improving performance
> on a convolutional RBM.
>
> The suggestion is to do two basic things that can help a lot:
>
> * Use DFTs and hadamard products for the convolutions
> * Mix pairs of samples as complex samples. eg, given two samples A &
> B, you'd produce a new sample C = A + iB
>
And by the way, thanks both of you guys for sharing such tricks. We
should at some point in time write all these kind of tricks somewhere
for new users to see. This somehow feels like a tutorial on
convolutional RBM ... though there is need for quite a bit more work
to get it into a tutorial. If anyone wants to dedicate time to write a
tutorial is welcome ( I think we can easily accept 3rd party tutorials
:) - giving credit of course to the author).
Razvan
A big part of the cost savings is being able to do summations prior
to computing inverse DFTs.. If it's straightforward to write it in
such a way that it is aware of and can take advantage of properties of
DFTs then awesome, otherwise it might be more hassle than its worth.
Even if it is too much of a hassle to get all the tricks in there, a
general version that handles most users needs would still be a good
thing to have.
-Brian
I was also wondering if there is perhaps a better way to do zeropadding than this:
... seeing as this seems to be the bottleneck now, rather than the
zero_padding = T.zeros_like(h_shuffled)[:,:,:, 0:self.filter_width-1]
h_padded = T.concatenate([zero_padding, h_shuffled, zero_padding],
axis=3)
convolution :)
Thanks for your feedback everyone.
Sander
Just to be clear, are the dimensions of your input tensor h the
following (tell me if I am wrong) ?
0 : number of inputs
1 : number of feature maps (in this case = 1)
2 : number of features at each time step (same as the filter shape)
3 : time steps
Unless you're doing convolutions with DFTs instead of directly
computing the convolution, I don't think you need to do zero padding.
-Brian
Philippe
Product of DFTs produces circular convolution. To simulate linear convolution
you need padding. A good introduction is "Digital Signal Processing",
Proakis.
If you are convolving a short sequence by a long one (i.e., filtering), look
for overlap-save (or overlap-add) technique. I find overlap-save easier
conceptually.
If sequence1 is length N1 and sequence2 is length N2>n1, then use a DFT of
length N>N1. A convenient choise is N=2N1. After each inverse DFT, discard
the 1st N1 outputs. Then shift the inputs by N1 to compute the next batch.
(correlation is basically the same but swap discarding 1st<->2nd half)