argmax-pooling and unpooling

832 views
Skip to first unread message

danst...@gmail.com

unread,
Jan 8, 2016, 1:39:26 PM1/8/16
to lasagne-users
Hi,

I'm interested in reversing maxpooling while using knowledge of the max locations. In other words, as well as the max operation in maxpooling, I would need the argmax operation, and separately a layer that could use the max and argmax to make an upsampled tensor using info from max and from argmax. An example of this in use is shown in the following paper:

Learning Deconvolution Network for Semantic Segmentation
http://arxiv.org/abs/1505.04366
See Fig 3 and Fig 4.

Is there an elegant way to do this in Lasagne? Either an existing way, or a good way you would suggest implementing it.

(The existing Pool*DLayer only offer the modes that downsample.max_pool_2d offers, so it doesn't seem to be a fruitful template here. Similarly, the Upscale*DLayer only do upsampling by flood-fill rather than by putting point in places.)

Thanks
Dan

Sander Dieleman

unread,
Jan 8, 2016, 4:49:14 PM1/8/16
to lasagne-users
I'm not 100% sure, but I think you can use InverseLayer for this. If I recall correctly it will end up using the 'switches' from the pooling layer you've inverted.

Sander

Søren Sønderby

unread,
Jan 8, 2016, 5:28:11 PM1/8/16
to lasagn...@googlegroups.com
Note sure if this fits your needs but there is max_pool_2d_same_size in theano. 
I think max_pool_2d_same_size > 0 would give you the argmax? 
-- 
You received this message because you are subscribed to the Google Groups "lasagne-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lasagne-user...@googlegroups.com.
To post to this group, send email to lasagn...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lasagne-users/172bbc0e-f2e3-422d-bea9-a71e862eced0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Stowell

unread,
Jan 11, 2016, 6:22:53 AM1/11/16
to lasagn...@googlegroups.com
Hi all,

Thanks both for the suggestions.

Well I'm kinda amazed but Sander's suggestion is correct, InverseLayer
can automagically do it, and if that's not justification for using the
word "automagically" then I don't know what is.
https://gist.github.com/danstowell/192ad65527965086693d

Best
Dan
> You received this message because you are subscribed to a topic in the
> Google Groups "lasagne-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/lasagne-users/nMDgvXWtnEw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> lasagne-user...@googlegroups.com.
> To post to this group, send email to lasagn...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lasagne-users/87C0E412-568F-4BB3-BA37-FF9785071FC8%40gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
http://www.mcld.co.uk

Dan Stowell

unread,
Jan 11, 2016, 7:02:33 AM1/11/16
to lasagn...@googlegroups.com
Hi,

Oh dear, I'm getting some weird shape behaviour when I try to actually
use this trick. Here's modified code that shows the issue:
https://gist.github.com/danstowell/91a2303e9362933bba72
And the result of get_output_shape() is not the same as the result of
get_output().shape:

% python maxpool_undo.py
Using gpu device 0: GeForce GT 730M (CNMeM is disabled)
input (shape (16, 1, 132, 160))
shape of latents is claimed to be (16, 1, 132, 79)
latents (shape (16, 1, 132, 1))
output (shape (16, 1, 132, 160))

How can that be?

Best
Dan
--
http://www.mcld.co.uk

goo...@jan-schlueter.de

unread,
Jan 11, 2016, 7:48:30 AM1/11/16
to lasagne-users
How can that be?

You're using a MaxPool1DLayer on top of a 4D tensor, while it should only be applicable to a 3D tensor. downsample.max_pool_2d() doesn't crash because it just pools over the last two dimensions regardless of the input dimensionality. Anyway, both `get_output_shape_for()` and `get_output_for()` assume that the input was a 3D tensor, that's why both of the the output shapes go wrong.

We should check the input dimensionality similarly as it's done for the convolutional layers.
And you can get correct results by going to input of (16, 132, 160), or using a MaxPool2DLayer.

Best, Jan

Dan Stowell

unread,
Jan 11, 2016, 8:15:25 AM1/11/16
to lasagn...@googlegroups.com
Thanks Jan. If I change it to a 2D layer then you're right, the shape
is correct. I did that because I want to use 1D maxpooling on
spectrograms (I'll do it via pool_size=(1,mypoolsize) in future).
I filed an issue for the check you suggested:
https://github.com/Lasagne/Lasagne/issues/568

One thing I notice is that overlapping patches can give unexpected
results. I think this is just a "lucky" example, but notice how the
output creates a value of 18, which must be due to overlapping patches
both having the same winner, and thus both separately adding a 9 back
on to the reconstruction:

input:
[[[[ 0. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 9. 0. 0.]
[ 0. 0. 0. 9. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 7. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]]]


[[[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 3. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]]

[[ 0. 9. 0. 0. 0.]
[ 0. 0. 5. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]]]
latents (shape (2, 3, 1, 2))
output:
[[[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 18. 0. 0.]
[ 0. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 7. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]]]


[[[ 0. 0. 0. 0. 1.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]

[[ 0. 9. 0. 0. 0.]
[ 0. 0. 5. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]]]

(This is using https://gist.github.com/danstowell/192ad65527965086693d
- I've pushed a change to make it pool 2D.)

I would have wished the semantics to be overwrite rather than add.

Best
Dan

goo...@jan-schlueter.de

unread,
Jan 11, 2016, 1:03:00 PM1/11/16
to lasagne-users
One thing I notice is that overlapping patches can give unexpected
results. I think this is just a "lucky" example, but notice how the
output creates a value of 18, which must be due to overlapping patches
both having the same winner, and thus both separately adding a 9 back
on to the reconstruction: [...]

I would have wished the semantics to be overwrite rather than add.

Adding is the correct thing to do, though. Not that this is one-hot reconstruction is not actually "automagic", it's just the standard backpropagation pass through a max-pooling layer. You can think of a max-pooling layer as an ordinary fully-connected layer that has zero weights almost everywhere and ones for connecting the winners to their outputs. During backpropagation, the weight matrix stays as it was for the forward pass, so the gradient flows back to every winner through every connection it had to the output -- if a particular winner was chosen twice (for overlapping pooling), it will receive gradients from both outputs it was connected to. There's unfortunately nothing to wish for regarding the semantics, it's got to be this way for the gradients to be correct! If you need different semantics, then you will need to replace the InverseLayer with something else.

Best, Jan

Dan Stowell

unread,
Jan 11, 2016, 2:48:19 PM1/11/16
to lasagn...@googlegroups.com
Point taken - yes, I agree that theano is calculating the partial
derivative correctly. However that's not the same as saying it's doing
what we might want if we designed a max-unpooling layer of our own
choosing! Thanks for the detail though -

Dan
Reply all
Reply to author
Forward
0 new messages