Weight copying with parameter "group" != 1 (only in copy, not original)

47 views
Skip to first unread message

suk

unread,
Oct 10, 2016, 5:00:15 PM10/10/16
to Caffe Users
Hi all,

Is it possible to "multiply" a layer in channel direction, and have the parameter 'group' not be equal 1 (e.g.: 3 times the number of channels, and group = 3), and then finetune from weights trained on the original net? The weights in each group should all start from the original weights.

And what happens if you deploy it with the original weights?
Will caffe load the original weights for each of the groups, or will there just be an error (probably not), or will it just have random/starting weights for the two last groups while the first group gets the correct weights?

I was just wondering, because this would save me some copy-pasting of the whole net. And make the net.prototxt smaller.


A more general question would be: What happens if loading a net with weights that don't completely fit the original shape; is there any broadcasting happening?

Would be very cool if anyone has some experience with / knowledge in this :)

Przemek D

unread,
Oct 11, 2016, 3:09:32 AM10/11/16
to Caffe Users
I can't answer your first question, but for your "more general one" there is a simple answer: no you cannot initialize a layer with weights of any different size than this layer. Doesn't matter that there is space to load the data - if it doesn't perfectly fit the target shape, caffe will throw a shape mismatch error.

suk

unread,
Oct 11, 2016, 5:59:00 AM10/11/16
to Caffe Users
Thanks for your answer, but where do you know this from? Have you tried? Because I believe I've done this already (without noticing, at first). The channel size was 16 instead of 15, and it didn't mind at all.

Przemek D

unread,
Oct 11, 2016, 8:02:26 AM10/11/16
to Caffe Users
I have tried. Perhaps you misnamed your layers and caffe did not even try to load those filters? It's quite an unfortunate design choice, but caffe stores filter weights under layer name, not an optionally specifiable param name (like for sharing weights), so you have to be careful and name your layers exactly the same in the pretrained and finetuned model.


W dniu poniedziałek, 10 października 2016 23:00:15 UTC+2 użytkownik suk napisał:

suk

unread,
Oct 11, 2016, 10:19:27 AM10/11/16
to Caffe Users
Thanks! :) Mistake on my side, sorry. I looked up the same thing this morning (which names need to be equal in copying weights versus sharing weights), but only double checked now - you were right, the layers with different sizes had different names. I never noticed...

Then for the other questions, it's probably enough to check if caffe complains or not. Thanks man! That explains other things as well ^.^'

Reply all
Reply to author
Forward
0 new messages