Unexpected results after finetuning bvlc_reference_model

18 views
Skip to first unread message

Carlo Alessi

unread,
Feb 25, 2017, 4:52:00 PM2/25/17
to Caffe Users
Hi,

I finetuned the bvlc_reference_model using this tutorial http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html training the model on 3 classes (two of which were already present in the reference model).

I changed the name of the last fc layer to "fc_finetuned" and set the lr_mult parameter of all the other layers to 0. I've also added one class, so num_output is 1001.
The problem is that now the network achieves great performance just on these 3 classes but has forgotten all the others previously learned.

1) What happened?
2) what should I do to continue this path?

I attach my solver.prototxt and the train_val.prototxt
train_val.prototxt
solver.prototxt

Przemek D

unread,
Mar 2, 2017, 7:56:55 AM3/2/17
to Caffe Users
1. The last fc layer performs the final classification, ie. contains empirical knowledge about the objects you want to distinguish. This knowledge comes from pretraining the network, and if you want to preserve it you must transfer weights from the pretrained layer - you however renamed it (so prevented the transfer), effectively overwriting that knowledge with new, random data. You then trained the net again on a very limited dataset, only showing examples of 3 classes. Naturally, the network will not be able to recognize anything else.

2. I've only just read your answer in the other thread. What you want to do is create a new layer with 1001 outputs but copy weights for the first 1000 from the pretrained model. This can't be done simply with prototxt - caffe will complain about the assignment. You need to manually edit the weights. The tutorial I've linked shows the basics of how to do that. In short, you want to 1) load the pretrained model, 2) load a blank (randomly initialized) network of your desired shape, so with the last layer with 1001 outputs, 3) copy weights from the first model into the second, and finally 4) save the new model and use it in fine-tuning.
At least, this is how I would attempt this. I won't give you my word that this is going to let you easily train the new class (you might still need to give the network at least some examples of the other classes), but it's a start. Let us know if this works.
Reply all
Reply to author
Forward
0 new messages