Preactivation ResNet

587 views
Skip to first unread message

Florian Muellerklein

unread,
May 11, 2016, 10:07:28 AM5/11/16
to lasagne-users
Hello Everyone

I was wondering if anyone has tried to recreate the preactivation ResNet from 'Identity Mappings in Deep Residual Networks' (https://arxiv.org/pdf/1603.05027.pdf). I am able to reproduce the results for ResNet-110 with the stack of two 3x3 conv layers. But I can not reproduce their performance with the bottleneck architecture, I may have created it wrong. Are there any other examples of this architecture aside from the Facebook Torch code and Kaiming He's 1k layers repo?


goo...@jan-schlueter.de

unread,
May 11, 2016, 1:02:17 PM5/11/16
to lasagne-users
I was wondering if anyone has tried to recreate the preactivation ResNet from 'Identity Mappings in Deep Residual Networks' (https://arxiv.org/pdf/1603.05027.pdf).

I only used the non-bottleneck preactivation architecture in some of my own (non-computer-vision) experiments and it works well, but I did not try recreating their results.

I am able to reproduce the results for ResNet-110 with the stack of two 3x3 conv layers.

Cool! Would be nice to have this in Lasagne/Recipes as well.

But I can not reproduce their performance with the bottleneck architecture, I may have created it wrong.

Your Readme says "I am using a smaller batch size because of hardware constraints, 32 instead of 128.". This could make quite a difference, as observed at https://github.com/KaimingHe/resnet-1k-layers/#notes. How long is the training time? Maybe I can run it on one of our GPUs with batch size 64 or 128.

Best, Jan

Florian Muellerklein

unread,
May 11, 2016, 2:03:26 PM5/11/16
to lasagne-users, goo...@jan-schlueter.de
Cool! Would be nice to have this in Lasagne/Recipes as well.

That would be great, I'd love to see it added! I was hoping to reproduce the bottleneck results first and get some input from other users as to whether or not everything looks correct.  

Your Readme says "I am using a smaller batch size because of hardware constraints, 32 instead of 128.". This could make quite a difference, as observed athttps://github.com/KaimingHe/resnet-1k-layers/#notes. How long is the training time? Maybe I can run it on one of our GPUs with batch size 64 or 128.

I saw that as well, I'm able to run batches of 64 on my GTX 980. It takes about 11 hours for me, with no real difference in performance between that and 32.  

Looking at the Facebook code, they have a maxpool after the first convolution for bottleneck. There is no mention of using that maxpooling layer in 'Identity Mappings ...' but Kaiming He's 1k layers code defines it in the beginning but never uses it. I'm wondering if it is actually a necessary component. 

Florian Muellerklein

unread,
May 24, 2016, 11:06:40 AM5/24/16
to lasagne-users, goo...@jan-schlueter.de
I just made some tweaks for it to be consistent with 'Wide Residual Networks' (http://arxiv.org/pdf/1605.07146v1.pdf). I can add this to my repo for Preactivation ResNet and see if I can reproduce the CIFAR-10 results. If so maybe we could add 3x3 stacked PreResNet and WideResNet to Lasagne Model Zoo? Ignoring bottleneck until I find time to debug it. I'm thinking the best way to add it would be to just change the model code in the existing ResNet example with mine for PreResNet and WideResNet. That way everything is consistent. 

Reply all
Reply to author
Forward
0 new messages