Data derivatives in Deconvolution layers all zeroes

25 views
Skip to first unread message

Alex Ter-Sarkisov

unread,
Jul 24, 2017, 2:47:57 AM7/24/17
to Caffe Users

This is a really weird error


However I init Deconv layers (bilinear or gaussian), I get the same situation:


1) Weights are updated, I checked this for multiple iterations. The size of deconvolution/upsample layers is the same: (2,2,8,8)

First of all, net_mcn.layers[idx].blobs[0].diff return matrices with floats, the last Deconv layer (upscore5) produces two array with the same numbers with opposite signs, i.e. weights should be going at the same rate in different directions, but the resulting weights are in fact almost identical!


Quite surprisingly, the remaining four deconv layers do not have this error. So when I compare models, for example, for iter=5000 and iter=55000 deconv layers weights are very different.


Even more surprisingly, other layers (convolutional) change much less!


2) Blobs diffs are all zeros for deconvolution layers


Data stream (Finding gradient of a Caffe conv-filter with regards to input) diffs for almost ALLdeconv layers are all zeroes for the full duration of the algorithm, with a few exceptions (also near 0 like -2.28945263e-09).


Convolution layer diffs look OK.


I see this as as a paradox - the weights in the deconv layers are updated but diffs wrt to the neurons are all 0's (constant?)


3) Deconv features grow really large quickly


Far larger than in FCNs and CRFasRNN, up to 5.4e+03, at the same time nearby pixels can have very varying values (e.g. 5e+02 and -300) for the same class.


4) Training and validation error go down, often very quickly


So putting it all together- I don't understand what to make of it. If it is overfitting, then why does validation error reduces too?

The architecture of the network is


fc7->relu1->dropout->conv2048->conv1024->conv512->deconv1->deconv2->deconv3->deconv4->deconv5->crop->softmax_with_loss

Developer

unread,
Jul 24, 2017, 6:02:42 AM7/24/17
to Caffe Users
I have a problem in the train of my networks is that the loss remain fixed and i don't find a good solution to reduce it
can you help me please 
( i work with the model to locolize the text in images 
Reply all
Reply to author
Forward
0 new messages