DilatedConv2DLayer much slower than Conv2DLayer

590 views
Skip to first unread message

André Vidal

unread,
Jan 17, 2017, 7:59:31 AM1/17/17
to lasagn...@googlegroups.com
Hello,

This is my first time posting here , so apologies if I'm not fully
adhering to the best practices :)
Lately, I've been experimenting a lot with DilatedConv2DLayer
convolutions, and despite the fact that the results are awesome, I
noticed it is rather slow when compared to "normal" Conv2DLayer
convolutions. In particular, when taking two networks consisting of only
one layer, both having the same receptive field, for example:
net1: DilatedConv2DLayer with filter_size=3 and dilation=2,
net2: Conv2DLayer with filter_size=5

I noticed that despite the fact that net2 has many more parameters, it
is significantly faster than net1.
I wrote a small script to test this out (please see attachment) and I
tested it in a GTX 980. It seems the dilated conv network is almost 2
times slower than the conv network (having almost 3x the number of
parameters)

Running the script I get the following output:

$ python test_conv_vs_dilatedconv.py
Using gpu device 0: GeForce GTX 980 (CNMeM is disabled, cuDNN 5005)
Number of parameters of Conv network 48064
Number of parameters of DilatedConv network 17344
total_time_conv : 4.3996155262s
total_time_dilated_conv : 8.18437623978s

Here's my setup:
>>> import theano Using gpu device 0: GeForce GTX 980 (CNMeM is
disabled, cuDNN 5005)
>>> import lasagne
>>> theano.__version__
'0.9.0dev4.dev-1f0f9126c4f1c01630c16c64c48ea9379df1bde0'
>>> lasagne.__version__ '0.2.dev1'

(I don't have any local modifications of either Lasagne or Torch)

This is how my .theanorc looks like:
[global]
floatX = float32
device = gpu
allow_gc = True

So the question really is:
- Does this make sense? Intuitively I'd think that even if the
implementation is more complex, because the dilated network has much
less parameters it would be faster (or at least within similar values)
- If not, is there something that I am missing (e.g. local config) ?

Thanks in advance for your help!
André
test_conv_vs_dilatedconv.py

Jan Schlüter

unread,
Jan 19, 2017, 1:35:43 PM1/19/17
to lasagne-users
Hey André!


I wrote a small script to test this out (please see attachment) and I
tested it in a GTX 980. It seems the dilated conv network is almost 2
times slower than the conv network (having almost 3x the number of
parameters) 
[...] 
 
So the question really is:
- Does this make sense? Intuitively I'd think that even if the
implementation is more complex, because the dilated network has much
less parameters it would be faster (or at least within similar values)
- If not, is there something that I am missing (e.g. local config) ?

This may be an effect of how I implemented dilated convolution to get it into Lasagne before Theano natively supported it. Dilated convolution can be seen as the backward pass of a strided convolution with respect to the weights, and that's what the DilatedConv2DLayer currently does: https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/conv.py#L937-L953
This allowed us to a) quickly support dilated convolution, and b) use cuDNN for dilated convolution even though cuDNN does not explicitly support it.
In the meantime, Theano has adopted another implementation of dilated convolution from caffe, and exposes this via the standard T.nnet.conv2d() interface.
It would actually be very helpful if you could compare these implementations. If you have the time, you could try to:
- copy the DilatedConv2DLayer from Lasagne into your test script
- delete the "get_W_shape()" method
- replace the "convolve()" method with the default one: https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/conv.py#L602-L609
- add "filter_dilation=self.dilation" after the existing "filter_flip=" argument
Then run your benchmark again and see if this is faster. It's expected to still be slower than cuDNN, because the caffe convolution is slower than the cuDNN one, but it may be faster than the cuDNN backward pass we currently use. Let me know if you have the time to try and if you find out anything!

Best, Jan

Erfan Noury

unread,
Jan 20, 2017, 7:35:49 AM1/20/17
to lasagne-users
Hello,

I changed the code as you said and on my GPU (Quadro K1100M), using T.nnet.conv2d is faster than the other two.
Here is the result:
Using gpu device 0: Quadro K1100M (CNMeM is disabled, cuDNN 5105)

Number of parameters of Conv network 48064
Number of parameters of DilatedConv network 17344
Number of parameters of CustomDilatedConv network 17344
total_time_conv
: 47.3799986839s
total_time_dilated_conv
: 71.5720071793s
total_time_custom_dilated_conv
: 31.1269910336s
BTW, in addition to the instructions you told, I had to remove the stride when calling T.nnet.conv2D, too.

Andre Vidal

unread,
Jan 20, 2017, 10:40:42 AM1/20/17
to lasagne-users
Hello Jan!

Thank you for your quick reply and insights.

I did the same test as Erfan, and in accordance to what he reported I also have small a improvement when compared to the "normal" Conv network - so it's as excepted slightly faster now:)

Now, because you asked to delete the get_W_shape() method of the DilatedConv2DLayer, the shape of the params are now different (they have an equivalent shape as the "normal" Conv layer).

They look like this:


Number of parameters of Conv network 48064
Net params:
0 : (64, 30, 5, 5)
1 : (64,)

Number of parameters of DilatedConv network 17344
Net params:
0 : (30, 64, 3, 3)
1 : (64,)

Number of parameters of CustomDilatedConv network 17344
Net params:
0 : (64, 30, 3, 3)
1 : (64,)

This is bit of a problem because I have trained a network in which the weights were saved in accordance to the "default" DilatedConv2DLayer, but I would like to use them in the same network but replacing the "default" layers with this CustomDilatedConv class to achieve higher processing speeds. Probably this is easy to address, but honestly I don't know much about the Lasagne/Theano mechanics, I'm just , well , a lasagne user :)

So what would be best to do?
 1) Adapt the weights to this new format?
or
 2) Customize the get_W_shape() ?


Thanks again for your advice,
André
Message has been deleted

Andre Vidal

unread,
Jan 26, 2017, 4:58:49 AM1/26/17
to lasagne-users
I implemented it by swapping the weights, and it works perfectly (and much faster :) )
Just wondering if there's a better way to do it, otherwise, I'm happy with it :)

Jan Schlüter

unread,
Jan 26, 2017, 5:25:35 PM1/26/17
to lasagne-users
Hey Erfan and Andre!

Thank you for the benchmarks, this is good to know! Also nice to see the 5x5 convolution with cuDNN is slower than the dilated 3x3 caffe-based convolution.

Now, because you asked to delete the get_W_shape() method of the DilatedConv2DLayer, the shape of the params are now different (they have an equivalent shape as the "normal" Conv layer). [...]

This is bit of a problem because I have trained a network in which the weights were saved in accordance to the "default" DilatedConv2DLayer, but I would like to use them in the same network but replacing the "default" layers with this CustomDilatedConv class to achieve higher processing speeds.

Yes, that's something we unfortunately have to address somehow if we change the implementation to use T.nnet.conv2d(). Probably we should provide an option to use swapped weights, for compatibility to the current implementation. The fastest way is to swap the weights when loading an old model (i.e., swap the numpy arrays you pass to set_all_param_values).

Best, Jan
Reply all
Reply to author
Forward
0 new messages