This is a really weird error
However I init Deconv layers (bilinear or gaussian), I get the same situation:
1) Weights are updated, I checked this for multiple iterations. The size of deconvolution/upsample layers is the same: (2,2,8,8)
First of all, net_mcn.layers[idx].blobs[0].diff
return matrices with floats, the last Deconv layer (upscore5
) produces two array with the same numbers with opposite signs, i.e. weights should be going at the same rate in different directions, but the resulting weights are in fact almost identical!
Quite surprisingly, the remaining four deconv layers do not have this error. So when I compare models, for example, for iter=5000
and iter=55000
deconv layers weights are very different.
Even more surprisingly, other layers (convolutional) change much less!
2) Blobs diffs are all zeros for deconvolution layers
Data stream (Finding gradient of a Caffe conv-filter with regards to input) diffs for almost ALLdeconv layers are all zeroes for the full duration of the algorithm, with a few exceptions (also near 0 like -2.28945263e-09
).
Convolution layer diffs look OK.
I see this as as a paradox - the weights in the deconv layers are updated but diffs wrt to the neurons are all 0's (constant?)
3) Deconv features grow really large quickly
Far larger than in FCNs and CRFasRNN, up to 5.4e+03, at the same time nearby pixels can have very varying values (e.g. 5e+02 and -300) for the same class.
4) Training and validation error go down, often very quickly
So putting it all together- I don't understand what to make of it. If it is overfitting, then why does validation error reduces too?
The architecture of the network is
fc7->relu1->dropout->conv2048->conv1024->conv512->deconv1->deconv2->deconv3->deconv4->deconv5->crop->softmax_with_loss