Derivatives wrt to data: size of layers vs size of input

8 views
Skip to first unread message

Alex Ter-Sarkisov

unread,
Jul 26, 2017, 8:37:58 AM7/26/17
to Caffe Users
So after some extensive hacking I found out that net.blobs[...].diff, i.e. derivatives wrt to data depend in the size of the input. For example, if the input is 3x250x250 and one of the layers is, for example 64x350x350, then all (or most) entries outside of 250x250 subset will be zeroes, so essentially only a part of the layer will be learning anything. 

My problem is, I'm using features as inputs, 4096x8x8 (fc7 from FCN8s) for instance segmentation, i.e. layers grow quite quickly to become 250x250, hence most entries are 0s, which is bad. I only realized it today, so the only two solutions I could think of are either flattening the input to 4x250x250 or using the image as the second input, neither of which sound too appealing (I havent' tried yet though).

Any suggestions on what to do in this situation would be appreciated. Perhaps there's some clever trick on how to extend diffs to entries outside of the input size. Since I'm not fully clear on how diffs are calculated for every neuron and how they are used for weight update, I'm a bit stuck here. 
Reply all
Reply to author
Forward
0 new messages