Unpooling using max-indices from encoder feature maps (SegNet implementation)

772 views
Skip to first unread message

Kristofer Krus

unread,
May 31, 2016, 6:00:55 AM5/31/16
to lasagne-users
Hi!

I'm trying to implement the SegNet architecture, but I don't know how to implement the pooling and the unpooling layers. I have started from the VGG16 architecture, which looks like this:

from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
from lasagne.layers import Pool2DLayer as PoolLayer
from lasagne.nonlinearities import softmax
from lasagne.utils import floatX


def build_vgg16():
    net
= {}
    net
['input'] = InputLayer((None, 3, 224, 224))
    net
['conv1_1'] = ConvLayer(net['input' ], 64 , 3, pad=1)
    net
['conv1_2'] = ConvLayer(net['conv1_1'], 64 , 3, pad=1)
    net
['pool1'] = PoolLayer(net['conv1_2'], 2)
    net
['conv2_1'] = ConvLayer(net['pool1' ], 128, 3, pad=1)
    net
['conv2_2'] = ConvLayer(net['conv2_1'], 128, 3, pad=1)
    net
['pool2'] = PoolLayer(net['conv2_2'], 2)
    net
['conv3_1'] = ConvLayer(net['pool2' ], 256, 3, pad=1)
    net
['conv3_2'] = ConvLayer(net['conv3_1'], 256, 3, pad=1)
    net
['conv3_3'] = ConvLayer(net['conv3_2'], 256, 3, pad=1)
    net
['pool3'] = PoolLayer(net['conv3_3'], 2)
    net
['conv4_1'] = ConvLayer(net['pool3' ], 512, 3, pad=1)
    net
['conv4_2'] = ConvLayer(net['conv4_1'], 512, 3, pad=1)
    net
['conv4_3'] = ConvLayer(net['conv4_2'], 512, 3, pad=1)
    net
['pool4'] = PoolLayer(net['conv4_3'], 2)
    net
['conv5_1'] = ConvLayer(net['pool4' ], 512, 3, pad=1)
    net
['conv5_2'] = ConvLayer(net['conv5_1'], 512, 3, pad=1)
    net
['conv5_3'] = ConvLayer(net['conv5_2'], 512, 3, pad=1)
    net
['pool5'] = PoolLayer(net['conv5_3'], 2)
    net
['fc6'] = DenseLayer(net['pool5'], num_units=4096)
    net
['fc7'] = DenseLayer(net['fc6' ], num_units=4096)
    net
['fc8'] = DenseLayer(net['fc7' ], num_units=1000, nonlinearity=None)
    net
['prob'] = NonlinearityLayer(net['fc8'], softmax)

   
return net

This network – exept from the fully connected layers at the end – is used in the SegNet architecture, and is referred to as an encoding network. SegNet also uses a corresponding, but mirrored, decoding network, which unpools instead of pools to make the size of the feature maps increasingly larger again. The unpooling layers use the indices of the max elements in the corresponding max pooling layer to scale up each feature map.

However, I don't know how to implement the unpooling layers. What is the simplest way to do that in Lasagne? Is there some finished class I could use, or do I have to create my own? Besides, can I tell the max pooling layers to store the indices of the max values somehow?

Regards

Kristofer

Jan Schlüter

unread,
Jun 1, 2016, 1:30:41 PM6/1/16
to lasagne-users
The unpooling layers use the indices of the max elements in the corresponding max pooling layer to scale up each feature map.

However, I don't know how to implement the unpooling layers. What is the simplest way to do that in Lasagne?

Unpooling using the indices of the pooling step is exactly what the gradient of the pooling layer does. Lasagne provides a layer for that: lasagne.layers.InverseLayer. It allows you to compute the gradient of some previous layer as part of the forward pass through the network: http://lasagne.readthedocs.io/en/latest/modules/layers/special.html#lasagne.layers.InverseLayer

Best, Jan

Kristofer Krus

unread,
Jun 14, 2016, 10:49:34 AM6/14/16
to lasagne-users
Hi Jan,

Thanks for your answer! I'm now using the InverseLayer class, and I think the results are looking fairly promising so far, even though I'm doing pixelwise regression to a grayscale image to which the different classes have been color-mapped – and not actually pixelwise softmax classification (I'm expecting the result to look better with softmax clasification). I had to take a look at a few examples (e.g. this and this) of how to use the class, and guess from those examples how to use it, although I'm not sure if I have completely understood what the class does.

According to the documentation, "the InverseLayer class performs inverse operations for a single layer of a neural network by applying the partial derivative of the layer to be inverted with respect to its input". To me, this sounds like it is calculating the Jacobian matrix. But what does it do the the matrix then? The output of the layer is a vector and not a matrix, right? I would like that the transpose of the Jacobian matrix is multiplied with the input to the InverLayer. Is this what happens?

Anyway, does this look like correct usage of the InverseLayer class to you?

from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer, InverseLayer, Upscale2DLayer

from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
from lasagne.layers.dnn import Conv2DDNNLayer as DeconvLayer

from lasagne.layers import Pool2DLayer as PoolLayer

def build_SegNet():
    net
= {}

   
# Build encoder (downsampling) part
    net
['input'] = InputLayer((None, 3, destH, destW))
    net
['conv1_1'] = ConvLayer(net['input'], 64 , hyperParams.encFilterSize, pad=1)
    net
['conv1_2'] = ConvLayer(net['conv1_1'], 64 , hyperParams.encFilterSize, pad=1)
    net
['pool1'] = PoolLayer(net['conv1_2'], 2, ignore_border=ignoreBorder)
    net
['conv2_1'] = ConvLayer(net['pool1'], 128, hyperParams.encFilterSize, pad=1)
    net
['conv2_2'] = ConvLayer(net['conv2_1'], 128, hyperParams.encFilterSize, pad=1)
    net
['pool2'] = PoolLayer(net['conv2_2'], 2, ignore_border=ignoreBorder)
    net
['conv3_1'] = ConvLayer(net['pool2'], 256, hyperParams.encFilterSize, pad=1)
    net
['conv3_2'] = ConvLayer(net['conv3_1'], 256, hyperParams.encFilterSize, pad=1)
    net
['conv3_3'] = ConvLayer(net['conv3_2'], 256, hyperParams.encFilterSize, pad=1)
    net
['pool3'] = PoolLayer(net['conv3_3'], 2, ignore_border=ignoreBorder)
    net
['conv4_1'] = ConvLayer(net['pool3'], 512, hyperParams.encFilterSize, pad=1)
    net
['conv4_2'] = ConvLayer(net['conv4_1'], 512, hyperParams.encFilterSize, pad=1)
    net
['conv4_3'] = ConvLayer(net['conv4_2'], 512, hyperParams.encFilterSize, pad=1)
    net
['pool4'] = PoolLayer(net['conv4_3'], 2, ignore_border=ignoreBorder)
    net
['conv5_1'] = ConvLayer(net['pool4'], 512, hyperParams.encFilterSize, pad=1)
    net
['conv5_2'] = ConvLayer(net['conv5_1'], 512, hyperParams.encFilterSize, pad=1)
    net
['conv5_3'] = ConvLayer(net['conv5_2'], 512, hyperParams.encFilterSize, pad=1)
    net
['pool5'] = PoolLayer(net['conv5_3'], 2, ignore_border=ignoreBorder)

    net
['fc6'] = DenseLayer(net['pool5'], num_units=4096)
    net
['fc7'] = DenseLayer(net['fc6'], num_units=4096)
    net
['fc8'] = DenseLayer(net['fc7'], num_units=1000, nonlinearity=None)
    net
['prob'] = NonlinearityLayer(net['fc8'], softmax)


   
# Build decoder (upsampling) part
    net
['unpool5'] = InverseLayer(net['pool5'], net['pool5'])
    net
['deconv5_3'] = DeconvLayer(net['conv5_3'], 512, hyperParams.decFilterSize, pad=1)
    net
['deconv5_2'] = DeconvLayer(net['deconv5_3'], 512, hyperParams.decFilterSize, pad=1)
    net
['deconv5_1'] = DeconvLayer(net['deconv5_2'], 512, hyperParams.decFilterSize, pad=1)
    net
['unpool4'] = InverseLayer(net['deconv5_1'], net['pool4'])
    net
['deconv4_3'] = DeconvLayer(net['unpool4'], 512, hyperParams.decFilterSize, pad=1)
    net
['deconv4_2'] = DeconvLayer(net['deconv4_3'], 512, hyperParams.decFilterSize, pad=1)
    net
['deconv4_1'] = DeconvLayer(net['deconv4_2'], 256, hyperParams.decFilterSize, pad=1)
    net
['unpool3'] = InverseLayer(net['deconv4_1'], net['pool3'])
    net
['deconv3_3'] = DeconvLayer(net['unpool3'], 256, hyperParams.decFilterSize, pad=1)
    net
['deconv3_2'] = DeconvLayer(net['deconv3_3'], 256, hyperParams.decFilterSize, pad=1)
    net
['deconv3_1'] = DeconvLayer(net['deconv3_2'], 128, hyperParams.decFilterSize, pad=1)
    net
['unpool2'] = InverseLayer(net['deconv3_1'], net['pool2'])
    net
['deconv2_2'] = DeconvLayer(net['unpool2'], 128, hyperParams.decFilterSize, pad=1)
    net
['deconv2_1'] = DeconvLayer(net['deconv2_2'], 64, hyperParams.decFilterSize, pad=1)
    net
['unpool1'] = InverseLayer(net['deconv2_1'], net['pool1'])
    net
['deconv1_2'] = DeconvLayer(net['unpool1'], 64, hyperParams.decFilterSize, pad=1)
    net
['deconv1_1'] = DeconvLayer(net['deconv1_2'], 1, hyperParams.decFilterSize, pad=1,
                                   nonlinearity
=None)

   
return net

In the paper, they have five pooling layers and 5 unpooling layers, but since there doesn't seem to be any layers in between the last pooling layer and the first unpooling layer I don't really see the benefit of those two layers, hence I haven't included them. But I feel like I have probably missed some detail in the paper.

Currently, the last layer is just a one-channel convolutional layer with a linear activation function, since the ground-truth image segments for an input image are just stored as a one channel png image. I'm going to increase the number of channels in the last convolutional layer to the number of classes in the ground-truth data, and then I want to do pixelwise softmax classification and use pixelwise categorical crossentropy for loss but I haven't found any way to do that. How do I do pixelwise softmax classification? And how do I calculate the pixelwise categorical crossentropy and then sum it over all pixels?

Regards

Kristofer

Kristofer Krus

unread,
Jun 21, 2016, 8:35:56 AM6/21/16
to lasagne-users
Is there anyone who can tell me whether the behavior of InverseLayer really is the behavior I described above? That is, calculating the matrix–vector multiplication of (1:) the transpose of the Jacobian matrix of the output of 'layer' with respect to its inputs (i.e. "the partial derivative of the layer to be inverted with respect to its input", I guess) and (2:) the output of 'incoming'? It is a bit ambiguous what "applying" the partial derivative (which is what is done according to the documentation) really means. Does "applying" in this case mean performing a multiplication?

Regards

Kristofer

Jan Schlüter

unread,
Jun 21, 2016, 11:21:14 AM6/21/16
to lasagne-users
Hey,

sorry for the late reply and the ambiguous formulation. It's the same operation that would happen when backpropagating through the inverted layer, i.e., computing the gradient wrt. the inverted layer's input given the gradient wrt. its output (where "inverted layer" is the layer to be inverted by the InverseLayer instance), and yes, this is an efficient way of multiplying by the transposed Jacobian (without explicitly computing the Jacobian).
What would be a better formulation for the InverseLayer docstring?


Anyway, does this look like correct usage of the InverseLayer class to you?

Yes!

Best, Jan

Kristofer Krus

unread,
Jun 21, 2016, 12:23:29 PM6/21/16
to lasagne-users
Hi Jan,

Okay, great! From a backpropagation perspective, I guess that you could write that if incoming is treated as (considered to be) the gradient of a scalar function f (which could be the loss function but doesn't have to be) with respect to layer, then the InverseLayer instance gives the gradient of f with respect to the layer going in to layer (i.e. layer.input_layer).

Maybe you could also write that this is equivalent to the product of the transposed Jacobian of the layer to be inverted – with respect to its input – and the input to the InverseLayer instance. This would at least be clear to me :)

I hope this should be correct! Might be worth double checking.

Regards

Kristofer

Kristofer Krus

unread,
Jun 21, 2016, 12:38:11 PM6/21/16
to lasagne-users
Although now I just supposed that f has to be scalar-valued and not vector-valued, or tensor-valued for that matter, but maybe that doesn't have to be the case? If f would be tensor-valued with shape f_shape, this would correspond to a situation where input.output_shape would equal layer.output_shape + f_shape. Maybe InverseLayer handles such situations?

Regards

Kristofer

Jan Schlüter

unread,
Jun 22, 2016, 6:20:58 AM6/22/16
to lasagne-users
Maybe InverseLayer handles such situations?

No, theano.grad() (which is used to create the graph in question) only supports scalar functions.


I hope this should be correct!
 
Would you like to submit a PR? To make it easier, you can even do so via the web interface: Just click the "edit" icon near the top right of https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/special.py.

Best, Jan
Reply all
Reply to author
Forward
0 new messages