Loading Caffe Model

453 views
Skip to first unread message

davidma...@gmail.com

unread,
Jun 1, 2016, 1:52:58 PM6/1/16
to lasagne-users
I'm trying to load the network from Yu & Koltun (http://arxiv.org/abs/1511.07122) using the model available at https://github.com/fyu/dilation and am running into some problems. I'm trying to follow along the recipe here to load the weights. Here's what I've got:

import caffe

net_caffe
= caffe.Net('yu-koltun-net.prototxt', 'yu-koltun-net.caffemodel', caffe.TEST)

import lasagne
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
from lasagne.layers import InputLayer, DropoutLayer
from lasagne.layers import Pool2DLayer as PoolLayer
from lasagne.utils import floatX
from lasagne.nonlinearities import rectify as relu
from lasagne.nonlinearities import softmax
from lasagne.layers import DilatedConv2DLayer as DilatedConvLayer
from lasagne.layers import DenseLayer
from lasagne.layers import NonlinearityLayer

nnet
= {}
nnet
['input'] = InputLayer((None, 3, None, None))
nnet
['conv1_1'] = ConvLayer(nnet['input'], num_filters=64, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv1_2'] = ConvLayer(nnet['conv1_1'], num_filters=64, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['pool1'] = PoolLayer(nnet['conv1_2'], pool_size=2, stride=2, mode='max', ignore_border=False)
nnet
['conv2_1'] = ConvLayer(nnet['pool1'], num_filters=128, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv2_2'] = ConvLayer(nnet['conv2_1'], num_filters = 128, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['pool2'] = PoolLayer(nnet['conv2_2'], pool_size=2, stride=2, mode='max', ignore_border=False)
nnet
['conv3_1'] = ConvLayer(nnet['pool2'], num_filters=256, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv3_2'] = ConvLayer(nnet['conv3_1'], num_filters=256, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv3_3'] = ConvLayer(nnet['conv3_2'], num_filters=256, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['pool3'] = PoolLayer(nnet['conv3_3'], pool_size=2, stride=2, mode='max', ignore_border=False)
nnet
['conv4_1'] = ConvLayer(nnet['pool3'], num_filters=512, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv4_2'] = ConvLayer(nnet['conv4_1'], num_filters=512, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv4_3'] = ConvLayer(nnet['conv4_2'], num_filters=512, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv5_1'] = DilatedConvLayer(nnet['conv4_3'], num_filters=512, dilation=(2,2), filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv5_2'] = DilatedConvLayer(nnet['conv5_1'], num_filters=512, dilation=(2,2), filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv5_3'] = DilatedConvLayer(nnet['conv5_2'], num_filters=512, dilation=(2,2), filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['fc6'] = DilatedConvLayer(nnet['conv5_3'], num_filters=4096, dilation=(4,4), filter_size=7, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['drop6'] = DropoutLayer(nnet['fc6'], p=0.5)
nnet
['fc7'] = ConvLayer(nnet['drop6'], num_filters=4096, filter_size=1, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['drop7'] = DropoutLayer(nnet['fc6'], p=0.5)
nnet
['fc-final'] = ConvLayer(nnet['drop7'], num_filters=21, filter_size=1, pad=0, flip_filters=False, nonlinearity=lasagne.nonlinearities.linear)

# begin context network
nnet
['ct_conv1_1'] = ConvLayer(nnet['fc-final'], num_filters=42, filter_size=3, pad=33, flip_filters=False, nonlinearity=relu) # has lr_mult and decay_mult
nnet
['ct_conv1_2'] = ConvLayer(nnet['ct_conv1_1'], num_filters=42, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu) # has lr_mult, decay_mult
nnet
['ct_conv2_1'] = DilatedConvLayer(nnet['ct_conv1_2'], num_filters=84, filter_size=3, dilation=(2,2), pad=0, flip_filters=False, nonlinearity=relu)
nnet
['ct_conv3_1'] = DilatedConvLayer(nnet['ct_conv2_1'], num_filters=168, filter_size=3, dilation=(4,4), pad=0, flip_filters=False, nonlinearity=relu)
nnet
['ct_conv4_1'] = DilatedConvLayer(nnet['ct_conv3_1'], num_filters=336, filter_size=3, dilation=(8,8), pad=0, flip_filters=False, nonlinearity=relu)
nnet
['ct_conv5_1'] = DilatedConvLayer(nnet['ct_conv4_1'], num_filters=672, filter_size=3, dilation=(16,16), pad=0, flip_filters=False, nonlinearity=relu)
nnet
['ct_fc1'] = ConvLayer(nnet['ct_conv5_1'], num_filters=672, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['ct_final'] = ConvLayer(nnet['ct_fc1'], num_filters=21, filter_size=1, pad=0, flip_filters=False, nonlinearity=lasagne.nonlinearities.linear)
#nnet['prob'] = DenseLayer(nnet['ct_final'], num_units=21, nonlinearity=lasagne.nonlinearities.softmax)
nnet
['output'] = lasagne.layers.FlattenLayer(nnet['ct_final'])
nnet
['prob'] = NonlinearityLayer(nnet['output'], nonlinearity=softmax)

# Copy parameters from Caffe to Lasagne
layers_caffe
= dict(zip(list(net_caffe._layer_names), net_caffe.layers))

for name, layer in nnet.items():
   
try:
        layer
.W.set_value(layers_caffe[name].blobs[0].data)
        layer
.b.set_value(layers_caffe[name].blobs[1].data)
   
except AttributeError:
       
continue

When I load an image to try to do the classification, I get a runtime error:

prob = lasagne.layers.get_output(nnet['prob'], the_image_to_classify, deterministic=True).eval()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-42-4bb404431448> in <module>()
----> 1 prob = lasagne.layers.get_output(nnet['prob'], im, deterministic=True).eval()

/data/deep-learning-env/anaconda2/lib/python2.7/site-packages/theano/gof/graph.pyc in eval(self, inputs_to_values)
   
521         args = [inputs_to_values[param] for param in inputs]
   
522
--> 523         rval = self._fn_cache[inputs](*args)
   
524
   
525         return rval

/data/deep-learning-env/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
   
869                     node=self.fn.nodes[self.fn.position_of_error],
   
870                     thunk=thunk,
--> 871                     storage_map=getattr(self.fn, 'storage_map', None))
   
872             else:
   
873                 # old-style linkers raise their own exceptions

/data/deep-learning-env/anaconda2/lib/python2.7/site-packages/theano/gof/link.pyc in raise_with_op(node, thunk, exc_info, storage_map)
   
312         # extra long error message in that case.
   
313         pass
--> 314     reraise(exc_type, exc_value, exc_trace)
   
315
   
316

/data/deep-learning-env/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
   
857         t0_fn = time.time()
   
858         try:
--> 859             outputs = self.fn()
   
860         except Exception:
   
861             if hasattr(self.fn, 'position_of_error'):

RuntimeError: GpuDnnConvGradW: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConvGradW{algo='none', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(4, 4), conv_mode='cross', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 292
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, True, False, False)), <theano.gof.type.CDataType object at 0x7fbe43ea2350>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(512, 1, 90, 90), (4096, 512, 7, 7), (512, 1, 66, 66), 'No shapes', (), ()]
Inputs strides: [(8100, 0, 90, 1), (25088, 49, 7, 1), (4356, 0, 66, 1), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <PyCObject object at 0x7fbd8b0cbfa8>, 1.0, 0.0]
Inputs name: ('image', 'grad', 'output', 'descriptor', 'alpha', 'beta')

Outputs clients: [[GpuDimShuffle{1,0,2,3}(GpuDnnConvGradW{algo='none', inplace=True}.0)]]

HINT
: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT
: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Any help on debugging this? Is there something clearly wrong with the way I have the network set up?

Jan Schlüter

unread,
Jun 2, 2016, 9:00:31 AM6/2/16
to lasagne-users, davidma...@gmail.com
I'm trying to load the network from Yu & Koltun (http://arxiv.org/abs/1511.07122) using the model available at https://github.com/fyu/dilation and am running into some problems. I'm trying to follow along the recipe here to load the weights. Here's what I've got:

Would be cool to add this to Lasagne/Recipes/modelzoo once it works!


RuntimeError: GpuDnnConvGradW: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConvGradW{algo='none', inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode='valid', subsample=(4, 4), conv_mode='cross', precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 292
Inputs types: [CudaNdarrayType(float32, (False, True, False, False)), CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, (False, True, False, False)), <theano.gof.type.CDataType object at 0x7fbe43ea2350>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(512, 1, 90, 90), (4096, 512, 7, 7), (512, 1, 66, 66), 'No shapes', (), ()]
Inputs strides: [(8100, 0, 90, 1), (25088, 49, 7, 1), (4356, 0, 66, 1), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <PyCObject object at 0x7fbd8b0cbfa8>, 1.0, 0.0]
Inputs name: ('image', 'grad', 'output', 'descriptor', 'alpha', 'beta')

The dilated convolution is implemented via the gradient wrt. weights. Let's see, here it's trying to compute the gradient wrt. weights of the following convolution:
input: 512, 1, 90, 90
kernel: 512, 1, 66, 66
output: 4096, 512, 7, 7
stride: 4
pad: 0

The kernel (i.e., the output of the dilated convolution) should have been (4096, 1, 66, 66). Looks like the output shape computation is wrong. Can you please print net['fc6'].output_shape, and net['fc6'].get_output_shape_for((512, 1, 90, 90))?

davidma...@gmail.com

unread,
Jun 2, 2016, 9:07:07 AM6/2/16
to lasagne-users, davidma...@gmail.com
I'll definitely put this up in the recipes once I get it going. I'm sure it would be useful for other people as well.

For fc6 output shape I get

(None, 4096, 66, 66)

and for the output_shape_for I get

(512, 4096, 66, 66)

Jan Schlüter

unread,
Jun 2, 2016, 9:08:03 AM6/2/16
to lasagne-users
The kernel (i.e., the output of the dilated convolution) should have been (4096, 1, 66, 66). Looks like the output shape computation is wrong.

No, wait, it's the kernel, that's why the output dimensions are wrong. You're setting the weights to (4096, 512, 7, 7), but the first two dimensions have to be swapped to give (512, 4096, 7, 7) for the DilatedConv2DLayer (this is contrary to the weight layout in standard convolutional layers, but it's more performant this way). Note that this is wrong for the other dilated convolutions as well, but you didn't notice (yet) because they didn't change the number of channels.

Do you have a way of comparing the predictions of your Lasagne model with the original one? If so, it would be cool to add this to the model zoo, including the pickled weights (Eben Olson can help you upload it).

Best, Jan

davidma...@gmail.com

unread,
Jun 2, 2016, 9:20:58 AM6/2/16
to lasagne-users
How do I actually swap those in the dilated layers? I'm very new to Lasagne and not at all familiar yet.

I've got the original classifier that I can run on the same images to compare at least. I'm hoping to be able to make direct comparisons to the original.

Jan Schlüter

unread,
Jun 2, 2016, 11:27:42 AM6/2/16
to lasagne-users
How do I actually swap those in the dilated layers? I'm very new to Lasagne and not at all familiar yet.

Where you have:
layer.W.set_value(layers_caffe[name].blobs[0].data)


Do something like:
W =
layers_caffe[name].blobs[0].data
if isinstance(layer, DilatedConvLayer):
    W = W.transpose(1, 0, 2, 3)
layer.W.set_value(W)

So the first two dimensions of W are swapped for the dilated convolution layers. Before setting the value you could also add a "assert W.shape == layer.W.get_value().shape" to make sure you're not replacing something with a wrong shape somewhere else.

davidma...@gmail.com

unread,
Jun 2, 2016, 12:05:53 PM6/2/16
to lasagne-users
Fantastic, thank you! As soon as I get the semantic segmentation working nicely I'll put this up in the recipes.

davidma...@gmail.com

unread,
Jun 3, 2016, 11:30:22 AM6/3/16
to lasagne-users, davidma...@gmail.com
I'm running into another problem with weight initialization. My Lasagne layers have a different shape from the Caffe layers. I was getting all background for my predictions so I checked to see if the weights were initialized properly. Comparing the parameter values in Lasagne and Caffe:

names = ['input','conv1_1','conv1_2','pool1','conv2_1','conv2_2',
         
'pool2','conv3_1','conv3_2','conv3_3','pool3','conv4_1','conv4_2','conv4_3',
       
'conv5_1','conv5_2','conv5_3','fc6','drop6','fc6','drop7','fc-final',
       
'ct_conv1_1','ct_conv1_2','ct_conv2_1','ct_conv3_1','ct_conv4_1','ct_conv5_1','ct_fc1','ct_final']

for name in names:
   
if len(lasagne.layers.get_all_param_values(net[name])) > 0 and not "pool" in name and not "drop" in name:
        lasagne_params
= lasagne.layers.get_all_param_values(net[name])[0]
        caffe_params
= caffe_net.params[name][0].data

       
print name, np.array_equal(caffe_params, lasagne_params)

Yields:

conv1_1 True
conv1_2 False
conv2_1 False
conv2_2 False
conv3_1 False
conv3_2 False
conv3_3 False
conv4_1 False
conv4_2 False
conv4_3 False
conv5_1 False
conv5_2 False
conv5_3 False
fc6 False
fc6 False
fc-final False
ct_conv1_1 False
ct_conv1_2 False
ct_conv2_1 False
ct_conv3_1 False
ct_conv4_1 False
ct_conv5_1 False
ct_fc1 False
ct_final False


So then I checked on the shapes:

print np.shape(lasagne.layers.get_all_param_values(net['conv1_2'])[0])
print np.shape(caffe_net.params['conv1_2'][0].data)

# output is
# (64, 3, 3, 3)
# (64, 64, 3, 3)


print np.shape(lasagne.layers.get_all_param_values(net['conv2_1'])[0])
print np.shape(caffe_net.params['conv2_1'][0].data)

# output is
# (64, 3, 3, 3)
# (128, 64, 3, 3)


For just the first few, why would these not get updated properly? Here's how I create the network (just the first layers):

nnet = {}
nnet
['input'] = InputLayer((1, 3, 900, 900))

nnet
['conv1_1'] = ConvLayer(nnet['input'], num_filters=64, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['conv1_2'] = ConvLayer(nnet['conv1_1'], num_filters=64, filter_size=3, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['pool1'] = PoolLayer(nnet['conv1_2'], pool_size=2, stride=2, mode='max', ignore_border=False)

Why are the batch size and number of channels not updating?

davidma...@gmail.com

unread,
Jun 3, 2016, 12:10:19 PM6/3/16
to lasagne-users, davidma...@gmail.com
Ah, it looks like I misunderstood my indices. When I do the index of the name-2 I get the correct layer. However, this shows that all layers except the DilatedConvLayers are initialized properly. The dilated convolution layers still fail. I'm looking at where the differences are. I checked conv5_1 and the first few weights are the same. Not sure why numpy says false. I'll update when I figure out more information.

davidma...@gmail.com

unread,
Jun 3, 2016, 12:18:41 PM6/3/16
to lasagne-users, davidma...@gmail.com
Okay, so the weights are indeed intialized incorrectly in the dilated layers.

print 'Lasagne layers', lasagne.layers.get_all_param_values(net['conv5_1'])[20][0]
print 'Caffe layers', caffe_net.params['conv5_1'][0].data[0]

The printout is:

Lasagne layers [[[  1.79773569e-03  -3.86781082e-03   1.44778879e-03]
 
[  5.62629709e-03  -3.78476828e-03  -3.50363902e-03]
 
[ -9.70753608e-04  -1.55057211e-03   2.05189921e-03]]

 
[[ -4.44258876e-05  -2.85256910e-03  -7.51745538e-04]
 
[ -5.39085083e-03  -8.67442507e-03  -6.84367260e-03]
 
[ -1.69579440e-03  -2.00542971e-03  -2.40873150e-03]]

 
[[  3.54722259e-03   7.02135731e-04   5.49884560e-03]
 
[  1.63380158e-04  -2.63992278e-03   4.20169148e-04]
 
[  1.86512922e-03  -1.90990162e-03   1.24418832e-04]]

 
...,
 
[[ -3.51253781e-03  -1.02917319e-02  -4.89753392e-03]
 
[ -3.92112508e-03  -7.78933940e-03  -7.75852590e-04]
 
[ -5.20925329e-04   8.36972799e-03   7.79767148e-03]]

 
[[ -1.18238491e-03  -8.94870237e-03  -5.45267761e-03]
 
[ -5.66318957e-03  -1.76325385e-02  -1.12886960e-02]
 
[ -8.11124220e-03  -1.42779266e-02  -8.83196760e-03]]

 
[[ -3.31869884e-03  -4.01778612e-03  -3.69090540e-03]
 
[ -3.45548079e-03  -3.44294333e-03  -2.09339289e-03]
 
[ -1.11368170e-03  -1.81419810e-03  -4.53128887e-04]]]
Caffe layers [[[  1.79773569e-03  -3.86781082e-03   1.44778879e-03]
 
[  5.62629709e-03  -3.78476828e-03  -3.50363902e-03]
 
[ -9.70753608e-04  -1.55057211e-03   2.05189921e-03]]

 
[[ -2.17772159e-03  -2.09413143e-03   2.61314330e-03]
 
[ -1.32299084e-02  -4.97253053e-03   6.14153594e-03]
 
[ -1.26133068e-02  -5.66283567e-03   1.61643873e-03]]

 
[[  4.73600952e-03  -1.91315322e-03   4.68450657e-04]
 
[  4.06997791e-03  -5.24714636e-03   6.18363556e-04]
 
[  1.02492245e-02  -7.65518111e-04   1.23429857e-03]]

 
...,
 
[[ -1.94619840e-03  -2.44452502e-03  -5.26150083e-03]
 
[ -2.70234491e-03  -3.81989777e-03  -5.41628478e-03]
 
[  6.36245997e-04   9.92348650e-04  -2.53115897e-03]]

 
[[  6.45608827e-03  -1.09352032e-02  -2.37631015e-02]
 
[ -3.14260926e-03   4.69462713e-03  -1.03884107e-02]
 
[ -6.74673310e-03  -4.62803198e-03  -6.75826194e-03]]

 
[[  9.95074442e-05   3.66775156e-03   2.45841919e-04]
 
[ -3.43237282e-03  -4.81389143e-04  -4.60039213e-04]
 
[ -2.06783577e-03   4.17942210e-05   5.18482819e-04]]]

Clearly these aren't the same, even though the first few numbers are. Is this something weird with the transpose in the weight copy?

    layers_caffe = dict(zip(list(net_caffe._layer_names), net_caffe.layers))

   
for name, layer in nnet.items():

       
if "pool" in name or "drop" in name or isinstance(layer, DenseLayer) or isinstance(layer, InputLayer):
           
continue
       
try:
            name
= layers_caffe[name]
            W
= name.blobs[0].data
           
if isinstance(layer, DilatedConvLayer):

                W
= W.transpose(1, 0, 2, 3)

           
assert W.shape == layer.W.get_value().shape
            layer
.W.set_value(W)
            layer
.b.set_value(name.blobs[1].data)
       
except AttributeError:
           
continue



On Wednesday, June 1, 2016 at 1:52:58 PM UTC-4, davidma...@gmail.com wrote:

Jan Schlüter

unread,
Jun 3, 2016, 1:24:17 PM6/3/16
to lasagne-users
Clearly these aren't the same, even though the first few numbers are. Is this something weird with the transpose in the weight copy?

Well, if initialized correctly, one should be the transpose of the other. You will need to compare the Lasagne weights with the caffe weights transposed (*.transpose(1,0,2,3)) for the dilated convolution layers. Check the shapes to verify.

If you don't get the same predictions, you may try comparing the intermediate outputs. For Lasagne, you get all intermediate outputs by:
outputs = lasagne.layers.get_output(lasagne.layers.get_all_layers(output_layer))
fn = theano.function([input_var], outputs)
fn(your_input_data)

If all outputs up to the first dilated convolution match, we know where to look further!

davidma...@gmail.com

unread,
Jun 3, 2016, 1:54:48 PM6/3/16
to lasagne-users
Right, thanks. All the layers initialize correctly. There's still an issue with the outputs.

I got the outputs from Lasagne and from Caffe and I ran

for i in range(len(lasagne_outputs)):
   
print i, np.array_equal(caffe_outputs[i], lasagne_outputs[i])

For layer 0 through layer 17 I get true. From layer 18 through layer 27 they're all false. The shapes match up at (1, 4096, 66, 66) for both. This is not at a dilated convolutional layer. The first 4 of those match up. Printing out one of the output arrays I get

print 'lasagne', lasagne_outputs[18][0][0][0]
print 'caffe', caffe_outputs[18][0][0][0]


# gives
lasagne
[ 0.05386817  0.0505662   0.05117193  0.04846499  0.04983476  0.04654144
 
0.03697921  0.01228332  0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.        ]
caffe
[ 0.          0.10113241  0.10234386  0.09692999  0.          0.09308289
 
0.07395843  0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.          0.          0.          0.
 
0.          0.          0.          0.        ]

davidma...@gmail.com

unread,
Jun 3, 2016, 3:10:30 PM6/3/16
to lasagne-users, davidma...@gmail.com
Mistake in my code. I was returning the wrong thing. With the correct outputs for lasagne and caffe I'm getting a simple failure. The output of the first layer (conv1_1) is different.

# get caffe output
net
.blobs['data'].data[...] = caffe_in
caffe_out
= net.forward(start='conv1_1', end='conv1_1')
# get lasagne output
lasagne_out
= fn(caffe_in)

# compare lasagne and caffe
print np.array_equal(lasagne_out[1], caffe_out['conv1_1'])
# prints False

# check shapes
print np.shape(caffe_out['conv1_1'])
print np.shape(lasagne_out[1])

# prints
# (1, 64, 898, 898)
# (1, 64, 898, 898)

# print out the first 5 elements for comparison
print 'lasagne', lasagne_out[1][0][0][0][:5]
print 'caffe', caffe_out['conv1_1'][0][0][0][:5]

# prints
# lasagne [ 0.  0.  0.  0.  0.]
# caffe [ 0.08922774 -1.54430449 -0.95565462 -0.32640123 -0.32640123]

David Mascharka

unread,
Jun 3, 2016, 6:14:12 PM6/3/16
to lasagne-users
Ok, so I'm narrowing down the problem there was an issue with the bias not being applied in the first layer. Now it looks like the first convolution layer is correct. I'm going to map it out layer by layer and try to figure this out. I'll update when I've got something concrete.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Sander Dieleman

unread,
Jun 4, 2016, 6:49:22 AM6/4/16
to lasagne-users
Just wanted to say thanks to both of you for spending the time to figure this out, I think people will find it super useful to have access to this model!

Sander


On Friday, June 3, 2016 at 11:14:12 PM UTC+1, David Mascharka wrote:
Ok, so I'm narrowing down the problem there was an issue with the bias not being applied in the first layer. Now it looks like the first convolution layer is correct. I'm going to map it out layer by layer and try to figure this out. I'll update when I've got something concrete.

davidma...@gmail.com

unread,
Jun 6, 2016, 9:11:50 AM6/6/16
to lasagne-users, davidma...@gmail.com
Okay, I've gotten to comparing outputs in each layer. I took an image and ran it through both the Caffe and Lasagne models and I'm comparing their intermediate outputs. In conv1_1, the first layer, I get np.allclose(caffe_conv1_1, lasagne_conv1_1) as true. However, the next layer fails. I looked at conv1_2 to see where it was going wrong and there are a lot of small differences (around 1e-05 to 2e-05). The maximum difference I found (didn't wait to loop through all 64*898*898 numbers) was 0.000113726. Presumably, these errors get amplified in the later layers and blow up. I'll investigate whether this is the case. For now, why might these differences be in place? Is it a Python thing? A lot of times it seems the Lasagne model is truncated compared to the Caffe model, where Caffe has an extra decimal place or two. In a few places though there are noticeable differences in the actual number in a few places.

davidma...@gmail.com

unread,
Jun 6, 2016, 9:13:30 AM6/6/16
to lasagne-users, davidma...@gmail.com
I'll add that even stronger than getting allclose on the first layer, I also get array_equal as true.

davidma...@gmail.com

unread,
Jun 6, 2016, 9:34:52 AM6/6/16
to lasagne-users, davidma...@gmail.com
It looks like I'm wrong about the errors exploding. In fc6 (layer 17), the maximum difference between Lasagne and Caffe is 1.1441e-05

davidma...@gmail.com

unread,
Jun 6, 2016, 9:45:54 AM6/6/16
to lasagne-users, davidma...@gmail.com
Maybe I'm not wrong, though. In fc-final, which is the end of the front-end module the maximum difference is 17.3796. This is in 21 66x66 feature maps. Then at the end of the context network this front-end feeds into, the maximum difference is 24.7394. The last dilated convolution layer ct_conv5_1 has a maximum error of 0.96.

Jan Schlüter

unread,
Jun 6, 2016, 10:41:55 AM6/6/16
to lasagne-users
Good investigation so far!
 
Maybe I'm not wrong, though. In fc-final, which is the end of the front-end module the maximum difference is 17.3796. This is in 21 66x66 feature maps. Then at the end of the context network this front-end feeds into, the maximum difference is 24.7394. The last dilated convolution layer ct_conv5_1 has a maximum error of 0.96.

What is the maximum relative difference, though? Can you compute
reldiff = np.abs(out_caffe - out_lasagne) / np.abs(out_caffe)
print reldiff.max()

Maybe the values are so large in that layer that a difference of 17 or 24 is not a problem?


I'll add that even stronger than getting allclose on the first layer, I also get array_equal as true.

That's an important check, it also shows that you've got the inputs correct (sometimes it's a bit involved to reproduce the preprocessing).

Best, Jan

davidma...@gmail.com

unread,
Jun 6, 2016, 10:49:07 AM6/6/16
to lasagne-users
In the final layer caffe_ct_final/lasagne_ct_final I get reldiff = 235966.0 with that code. For the fc_final layer I get 126659.0. And in the conv5_1 layer I get nan, so I'm not sure exactly what's going on there.

davidma...@gmail.com

unread,
Jun 6, 2016, 11:08:13 AM6/6/16
to lasagne-users, davidma...@gmail.com
Running

def max_rel_error(x, y):
 
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

on these same layers I get 1.0 for all the layers.

davidma...@gmail.com

unread,
Jun 7, 2016, 10:08:36 AM6/7/16
to lasagne-users, davidma...@gmail.com
I figured I'd check out all the weights and the biases again, just to make sure they're identical. I know the image is the same for each network and the output of the first layer is identical:

caffe_conv1_1 = caffe_outputs[0]
lasagne_conv1_1
= lasagne_outputs[1]

print np.array_equal(caffe_conv1_1, lasagne_conv1_1) # True

So I'm looking for any differences in the second layer:

caffe_conv1_2 = caffe_outputs[1]
lasagne_conv1_2
= lasagne_outputs[2]
print np.max(np.abs(caffe_conv1_2 - lasagne_conv1_2)) # 0.00109863

# Check to see if the weights are the same
lasagne_conv1_2_weights
= lasagne.layers.get_all_param_values(nnet['conv1_2'])[2]
caffe_conv1_2_weights
= net_caffe.params['conv1_2'][0].data
print np.array_equal(lasagne_conv1_2_weights, caffe_conv1_2_weights) # True

# Now check biases
lasagne_conv1_2_biases
= lasagne.layers.get_all_param_values(nnet['conv1_2'])[3]
caffe_conv1_2_biases
= net_caffe.params['conv1_2'][1].data
print np.array_equal(lasagne_conv1_2_biases, caffe_conv1_2_biases) # True

So as far as I can tell, somehow the input is identical, the weights are identical, the biases are identical, and the outputs are different. But it couldn't be an issue with the convolution operation, because the first layer works, right? I also checked and there doesn't appear to be any sort of normalization going on with the frontend (VGG-16 based) between layers 1 and 2 like there is in VGG-A-LRN. I may take the weights and try doing the convolution with numpy to see whether it produces the output of the Caffe or Lasagne model. I'm open to any other suggestions.

Jan Schlüter

unread,
Jun 7, 2016, 11:57:55 AM6/7/16
to lasagne-users
So as far as I can tell, somehow the input is identical, the weights are identical, the biases are identical, and the outputs are different. But it couldn't be an issue with the convolution operation, because the first layer works, right?

That's really puzzling. Can you have a more detailed look at the outputs? E.g., try plotting some feature maps?
import matplotlib.pyplot as plt
plt.matshow(caffe_conv1_2[0, 0])
plt.colorbar()
plt.matshow(lasagne_conv1_2[0, 0])
plt.colorbar()
plt.show()

davidma...@gmail.com

unread,
Jun 7, 2016, 12:56:10 PM6/7/16
to lasagne-users

The feature maps in this layer look basically the same. The max difference is 0.0018, so it's not noticeable, really. The second layer is on the left here. Here's that particular feature map (Caffe left, Lasagne right)

In fc-final it's a significant difference. That's the right set of images.

davidma...@gmail.com

unread,
Jun 7, 2016, 1:07:23 PM6/7/16
to lasagne-users, davidma...@gmail.com

Here's also a plot of the lasagne output plotted against the caffe output for that entire layer that I flattened. And then there's the outputs plotted individually (left caffe, right lasagne).

davidma...@gmail.com

unread,
Jun 7, 2016, 1:37:37 PM6/7/16
to lasagne-users, davidma...@gmail.com
Alright, I think I've gotten this narrowed down. I've been plotting and looking at layers of caffe and lasagne flattened and their difference plotted. Errors tend to be low (on the order of 1e-6 throughout the network and the max errors are pretty low for most of the network. At layer 13 the max error is 1.9e-5. In layer 14, the error jumps to 12.435 and the differences tend near 2. This corresponds to the first dilated convolution layer. Has that been tested out well? Is there something I need to do since the weight initialization required a transposition?

davidma...@gmail.com

unread,
Jun 13, 2016, 3:03:24 PM6/13/16
to lasagne-users, davidma...@gmail.com
Just wanted to bring this up once more. I haven't come across a solution yet. Should I open an issue up on Github about this? Thanks!

davidma...@gmail.com

unread,
Jun 13, 2016, 3:40:00 PM6/13/16
to lasagne-users, davidma...@gmail.com
Looking through the Lasagne, Theano, Torch, and Caffe issues/pull requests/source for convolutions, I'm not sure the Lasagne dilated convolution is doing exactly what it should be. Pull request for dilated convolution in Lasagne. It's setting `subsample` to `dilation` but I believe `subsample` here refers to `stride` instead. This PR in Theano is modifying im2col and col2im to get the dilation and requires an additional `dilation` parameter. In Theano, `subsample` is commented as `The subsampling used in the forward pass. Also called strides elsewhere.` (see here). The Caffe implementation by Yu also requires modifying im2col. Any insight here? I could be reading the source incorrectly but dilated convolutions appear a bit more involved.

davidma...@gmail.com

unread,
Jun 13, 2016, 4:40:04 PM6/13/16
to lasagne-users, davidma...@gmail.com
Sorry for so many updates. I think I understand what the Lasagne implementation is doing now. So it's using a backward pass with the weights as output, subsampling the output. From what I understand, this effectively skips some of the output weights (a factor of `dilation` is skipped), which performs the dilated convolution. This works in the special case where the stride is 1 and there is no padding. So it looks to me like the dilated convolution should be working. I'll try to investigate further.

Florian Bordes

unread,
Jun 16, 2016, 3:54:57 PM6/16/16
to lasagne-users, davidma...@gmail.com
Your model works, I got exactly the same results on the ct_final layer by dropping the dropout layers (You don't need those for inference time). Even, if I used a deterministic=True on my lasagne output, it seems that something happened with the dropout.
I am not sure about your way to perform the softmax. In our case, we should perform the Softmax across all the 21 channels. So before applying the softmax, we must switch the dimension of the output (ct_final layer) to get something with the shape (size, 21) and then applying the softmax on the last dimension.

davidma...@gmail.com

unread,
Jun 17, 2016, 8:40:55 AM6/17/16
to lasagne-users, davidma...@gmail.com, thef...@gmail.com
Hmm. Would you mind posting the code you used so I can compare directly?

Florian Bordes

unread,
Jun 17, 2016, 3:50:20 PM6/17/16
to lasagne-users, davidma...@gmail.com, thef...@gmail.com
Sure. You can find my code here: https://github.com/bordesf/dilation (The model is in dilated_cnn.py, and you can find an example of how to use it in predict.py)
I will write some code to create .pkl file of the lasagne model, so we will not need Caffe anymore.

The problem of dropout in your model came from those lines:

nnet['drop6'] = DropoutLayer(nnet['fc6'], p=0.5)
nnet
['fc7'] = ConvLayer(nnet['drop6'], num_filters=4096, filter_size=1, pad=0, flip_filters=False, nonlinearity=relu)
nnet
['drop7'] = DropoutLayer(nnet['fc6'], p=0.5)
nnet
['fc-final'] = ConvLayer(nnet['drop7'], num_filters=21, filter_size=1, pad=0, flip_filters=False, nonlinearity=lasagne.nonlinearities.linear)

You were using nnet['fc6'] in nnet['drop7'] instead of nnet['fc7'].

Jan Schlüter

unread,
Jun 22, 2016, 9:27:44 AM6/22/16
to lasagne-users, davidma...@gmail.com, thef...@gmail.com
Sorry for the late reply.
 
I think I understand what the Lasagne implementation is doing now. So it's using a backward pass with the weights as output, subsampling the output.

Exactly. This allows us to use existing convolution implementations rather than modifying them. In particular, it allows us to use cuDNN for dilated convolution. The modified caffe convolution (hopefully to be merged in Theano soon) might actually still be faster, though.


Has that been tested out well?

Yes: https://github.com/Lasagne/Lasagne/commit/1dfad0867a5661b95429f0c49a85f78da1c6904c
It actually compares the layer against a straightforward convolution with an explicitly dilated kernel.

Is there something I need to do since the weight initialization required a transposition?
 
Another pitfall may be that flip_filters=False for DilatedConv2DLayer. You may need to do W=W[:,:,::-1,::-1] if they're meant to be flipped. But it seems everything else was correct, since Florian can reproduce results?

 In our case, we should perform the Softmax across all the 21 channels. So before applying the softmax, we must switch the dimension of the output (ct_final layer) to get something with the shape (size, 21) and then applying the softmax on the last dimension.

We should revive https://github.com/Lasagne/Lasagne/issues/626, it's a pain to do it manually.

It would be something like:
nnet['probs'] = ExpressionLayer(net['fc-final'], lambda X: T.nnet.softmax(X.transpose(0, 2, 3, 1).reshape(-1, X.shape[1])).reshape(X.shape[0], X.shape[2], X.shape[3], X.shape[1]).transpose(0, 3, 1, 2))

i.e.: move channels to last dimension, flatten all dimensions before that, apply softmax, unflatten, move channels back were they belong.
It's not possible to express this via DimshuffleLayer, ReshapeLayer and NonlinearityLayer because we lack the symbolic shape required to unflatten. We could only use an InverseLayer. nonlinearity=lasagne.nonlinearities.softmax_per_location would make it easier.

If you get this to work, we'd highly appreciate a PR to Lasagne/Recipes/modelzoo!

Best, Jan

Alex

unread,
Jul 24, 2016, 2:14:18 AM7/24/16
to lasagne-users, davidma...@gmail.com, thef...@gmail.com
I came across this thread. Are there any updates? I guess everyone in the community would love to see this implemented. Sorry for being push :)

davidma...@gmail.com

unread,
Jul 25, 2016, 8:35:15 AM7/25/16
to lasagne-users, davidma...@gmail.com, thef...@gmail.com
https://github.com/Theano/Theano/pull/4587 was merged into Theano and implemented general dilated convolutions.
Reply all
Reply to author
Forward
0 new messages