Tips for using spatial transformer layers?

jesseb...@gmail.com

unread,

Oct 27, 2015, 6:52:11 PM10/27/15

to lasagne-users

I'm wondering if there are any tips/tricks for using spatial transformers that I should be aware of. I've spent a while now trying them out, but they are very hit-or-miss from my experience.

I tried it out using the distorted mnist dataset like the one in the examples, and it worked well. However, when attempting to use it on colored images from real photographs, that's where I start to run into issues.

Typically what happens is one of two things:

1. In the first few iterations the spatial transformer layer zooms *way too much*, and the convolutional network below then is stuck trying to learn on incomplete images. (e.g. 200px x 200px input image, the network below is doomed to learn from the same 10px x 10px corner of every image). When this happens no learning occurs even if you let it keep running for hundreds of epochs to see if it'll get unstuck (it never does).

2. The spatial transform doesn't zoom/rotate/skew at all, and you're left with a fancy downsampling layer in your network.

I assumed that maybe my learning rates were too high/small, but no matter what my initial rate is (0.01 to 0.00001), the same 2 outcomes occur. I have also noticed that the adam update rule seems to be the only way to even get it to outcome 2, otherwise it always ends up getting stuck in outcome 1.

I've proven to myself that it *can* work, since I've been able to get it to zoom/rotate/skew well on the distorted mnist set, but outside of black-and-white images, I'm stuck. I'm willing to try whatever tricks/diagnostics others may have as spatial transformers could be terrifically useful for a few datasets I have.

Sander Dieleman

unread,

Oct 27, 2015, 7:49:31 PM10/27/15

to lasagne-users

I've never tried training one of these myself, but perhaps it helps to put priors on the transformation parameters by adding penalty terms to your loss function? This could provide an extra (more direct) learning signal.

Some kind of annealing could also work, where you impose a stronger prior at the start of training (for example: don't zoom/rotate too much), and then gradually loosen it as training progresses. This allows the discriminative part of the net to learn something useful before the transformer is allowed to make huge changes to the input.

Just some ideas, maybe wait for someone with more experience with this type of models to chime in as well :)

Sander

emolson

unread,

Oct 27, 2015, 8:34:33 PM10/27/15

to lasagne-users

I'd suggest taking a look at this repo, which has some pretty comprehensive experiments on a more realistic image dataset:

https://github.com/moodstocks/gtsrb.torch

Replicating their architecture in Lasagne shouldn't be too difficult, and you'll have a dataset/hyperparameters that are known to work.

bawdyb

unread,

Oct 28, 2015, 4:22:47 AM10/28/15

to lasagne-users, jesseb...@gmail.com

Hi Jesse,

Can you provide us with a description of the network you used ?

Thanks !

jesseb...@gmail.com

unread,

Oct 28, 2015, 8:39:40 PM10/28/15

to lasagne-users, jesseb...@gmail.com

@Sander Dielman Thank you for the suggestions. Along the same lines of what you're saying, I tried freezing the spatial transformer at epoch 1 (setting the layers to trainable=False), then turning it on after a bit to see if maybe that'd help (didn't seem to help). I'll have to try adding a penalty and digesting the code to determine how to impose priors on the early augmentations and let you know how that turns out.

@emolson ewww lua! :D I'll have to look into getting their dataset and attempting to repro it to some extent in lasagne to make sure I can get similar results. From their I'd have a nice lasagne-based baseline for moving to another dataset, so thank you for pointing out that project.

@bawdyb I took the net from the example and tried applying it to another dataset as-is, just to use as a baseline. From there I tried adding/removing layers, making some layers wider (up from 20 & 50 units in the localization network to 32/64 & 64/128/256), different update rules (primarily adam/nesterov), more/less regularization via dropout, ramping up dropout over time (0.0 to start, up to 0.5 after N epochs), etc.

If anyone else has some tips/tricks feel free to let me know. In the meantime I'll take some of these suggestions and see how far I can get with them.

Thanks all!

goo...@jan-schlueter.de

unread,

Oct 30, 2015, 3:30:14 PM10/30/15

to lasagne-users

@emolson ewww lua! :D I'll have to look into getting their dataset and attempting to repro it to some extent in lasagne to make sure I can get similar results. From their I'd have a nice lasagne-based baseline for moving to another dataset, so thank you for pointing out that project.

Have a look at the accompanying blog post as well (ah, wait, I now see it's linked from the repo anyway.) If you manage to reproduce their results at least to some extent, please send us a pull request at Lasagne/Recipes, it would be great to have it in there! Otherwise I don't have anything to add -- starting from a known working configuration is always a good bet.

Best, Jan

apoor...@gmail.com

unread,

Feb 8, 2016, 10:16:34 AM2/8/16

to lasagne-users

Hi Jesse,
I have also been trying to use STNs on image datasets other than Mnist. I have also run into the issues you have mentioned. I have tried things like setting the rotation parameters to 0.0 so that it only translates, putting a sigmoid layer so that transformation coefficients are in the range 0-1. but havent been able to make them work.
Were you able to make them work for datasets other than Mnist. Can you share some tips for training STNs ?

Thanks,
Apoorv

jesseb...@gmail.com

unread,

Feb 8, 2016, 2:29:56 PM2/8/16

to lasagne-users, apoor...@gmail.com

Unfortunately, for the task I was working on I ended up abandoning the STN when I discovered something else that worked well enough for my needs. I still plan to investigate STNs further when I have more time, however I haven't made any progress yet.

apoor...@gmail.com

unread,

Feb 10, 2016, 10:54:40 AM2/10/16

to lasagne-users, apoor...@gmail.com, jesseb...@gmail.com

I am also exploring the same, I will post if I make some progress. Please keep me in loop if you learn something new.

sdw9...@gmail.com

unread,

Aug 8, 2018, 10:46:48 AM8/8/18

to lasagne-users

Hey guys, I met exactly same problems. Then I just tried training the network without STN and then used the trained weights as a start point to finetune the network with STN. That makes the process a lot more stable!

Reply all

Reply to author

Forward