Cannot copy param 0 weights from layer 'fc6'; shape mismatch.

1,057 views
Skip to first unread message

Carlo Alessi

unread,
Jan 9, 2017, 12:08:34 PM1/9/17
to Caffe Users
I fine-tuned the network used in Example/ImageNet. Training is ok but when I try to use the classification example with the following command:

./build/examples/cpp_classification/classification.bin \
  models/my_network/deploy.prototxt \
  models/my_network/mynet_train_012_iter_1000.caffemodel \
  data/mydata/imagenet_mean_012.binaryproto \
  data/mydata/synset_words.txt \
  data/mydata/testing/0/Chimpanzee/chimpanzee_38.png

I get this error:

Cannot copy param 0 weights from layer 'fc6'; shape mismatch.  Source param shape is 4096 9216 (37748736); target param shape is 4096 12544 (51380224). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

Any suggestion? In attachment are my deploy.prototxt and train_val.prototxt

Best regards



train_val.prototxt
deploy.prototxt

Przemek D

unread,
Jan 10, 2017, 2:20:29 AM1/10/17
to Caffe Users
Take a look at your inputs. in the train_val you crop the images to 227x227, while the deploy network starts with a 256x256 input. As convolutional layers change shape automatically to accommodate any input (as opposed to FC layers that have an explicitly shaped output), your convnet has different data blobs in both networks, resulting in a different number of inputs to layer fc6.
Message has been deleted

Carlo Alessi

unread,
Jan 10, 2017, 4:32:09 PM1/10/17
to Caffe Users


I see, It works if I change 256 to 227, thank you!! 
I thought 'crop size' was only used to enhance training, what is the thing I'm missing? 
So, if I want to use 256 as the image size ( I use --resize_height=256 --resize_width=256 when creating the databases), should I restart the training from scratch? 

Best regards,

Carlo 

Przemek D

unread,
Jan 11, 2017, 2:51:55 AM1/11/17
to Caffe Users
Indeed crop is used to augment data on-the-fly (I think caffe takes random crops of each image in the batch). By defining crop size, you fix the input shape of your network to that value. Changing the input shape changes your entire convolutional structure: eg. AlexNet's conv1 outputs a 55x55 blob for 227x227 input, but for 256x256 it'll output 62x62. Those differences propagate all the way up to the last conv layer - to follow with AlexNet example, previously you had a 256x6x6 output blob (pool5), for 256 input you'll have 256x7x7. FC layers expect constant size input: 256x6x6=9216 while 256x7x7=12544 (hence your original error: you trained a fc6 layer with 9216 inputs but tried to make an instance with 12544 inputs and copy weights from the old one, causing a shape mismatch).
Summing up, if you want to go with 256x256 input then yes, you have to retrain your network without the crop_size param.

husi...@gmail.com

unread,
Mar 31, 2017, 11:52:22 AM3/31/17
to Caffe Users
Hello, Przemek.

you did a great illustration of the reason why the error occurs, which helped me a lot to understand the error, but could you please talk more about how to solve this problem because the size of input images that I used is small(45*45*3) and I do want to go with that size. actually, I tried to rename fc6 in prototex file and train it in the terminal commend.
but when I follow the fine-tuning notebook example as below:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb

it is not the case of  just rename fc6 in .prototxt file, I saw the crop_size parameter as you mentioned before, I delete that parameter in the code but the error still exists. I think I may need to write code to rename fc6. I tried but it seems it does not works. could you give me some suggestions

Thank a lot.


在 2017年1月11日星期三 UTC-5上午2:51:55,Przemek D写道:
Message has been deleted

Przemek D

unread,
Apr 4, 2017, 5:33:30 AM4/4/17
to Caffe Users
Your problem is different. Carlo's images were larger than the default network shape, so his fc6 size would end up larger than default (12k vs 9k). It was only a matter of cropping them (with an option available in DataLayer by default) to match the desired size.
Your images are smaller, and your fc6 is less than 9k. Therefore you would need to add something (instead of cutting) - I see two solutions here:
* pad the data with zeros so their size matches the default AlexNet size (227,227,3) - unfortunately you have to do that on the LMDB level, as the DataLayer does not have (to the best of my knowledge) a "pad_size" param, similar to crop_size,
OR
* add a DummyData layer filled with zeros and use Concat to connect it with conv5 output (might be easier to add in a Reshape too), so that you pad the data right before the fc6.

However, in my opinion the best way to go is to not transfer FC weights at all, just retrain them. Personally I did not notice a large performance improvement due to pretrained FCs, the convolutions are most important.
Additionally, you could make totally new, smaller FC layers (as in: smaller num_output) - the advantage of this approach is less chance of overfitting. Sure you can extract 4096 features from a 227x227 image - but will a 45x45 image contain the same amount of information? It's over 25 times less input data, so even more reason against transferring such a large FC layers.

Daile Osorio

unread,
Jul 7, 2017, 9:57:57 AM7/7/17
to Caffe Users
Hi Carlos and Przemek D

I have the same problem "Cannot copy param 0 weights from layer 'upscore2_2class'; shape mismatch.  Source param shape is 2 1 4 4 (32); target param shape is 2 2 4 4 (64). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer."  but in the deconvolution layer. In my case I am using FCN8 and I am doing finetune in this net, and I started the training with the weights "fcn8s-heavy-pascal.caffemodel".

 In my case the images are of 400x300, However  the deploy.prototxt of the fcn have dim: 500 dim: 500, but my images have 400x300, I don't understand why when I do make infer.py I am obtaining this error with the deconvolution layer . Please I need your help.

Here I attached my train.prototxt and the deploy.prototxt.

Przemek D I already published this problem in caffe users with this issue "[caffe-users] U-Net Image segmentation won't converge, loss doesn't change significantly" and I specify more my problem, Please I need your help
train.prototxt
deploy.prototxt

Usman Muhammad

unread,
Jul 10, 2017, 1:14:05 AM7/10/17
to Caffe Users

Hello,

I am doing image classidfication and wants to get mini-batch loss curve. I've trained the model and already got the accuracy both on training and testing set. I have the table but don't know how to get the curve. Help will be highly appreciated. Thanks 

regards,
Usman
Reply all
Reply to author
Forward
0 new messages