finetune with less number of output categories

928 views
Skip to first unread message

Khalid Ashraf

unread,
Sep 10, 2014, 7:45:05 PM9/10/14
to caffe...@googlegroups.com
I want to train a model with lets say 40 categories and then finetune the net it with new data but with 20 output categories. 
Is there an automated way to do it. If not, what is the way to do it in the code ?

Thanks!
-Khalid 

Jason Yosinski

unread,
Sep 10, 2014, 7:54:22 PM9/10/14
to Khalid Ashraf, caffe...@googlegroups.com
Are the 20 categories a subset of the 40?

jason


---------------------------
Jason Yosinski, Cornell Computer Science Ph.D. student
http://yosinski.com/ +1.719.440.1357
> --
> You received this message because you are subscribed to the Google Groups
> "Caffe Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/caffe-users/5b208f88-af40-4875-8c81-bd923ea9ed72%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Khalid Ashraf

unread,
Sep 10, 2014, 7:58:55 PM9/10/14
to caffe...@googlegroups.com, kal.a...@gmail.com
I have both cases i.e. they could be subset of the 40 or new categories as well. 

Jason Yosinski

unread,
Sep 10, 2014, 8:49:19 PM9/10/14
to Khalid Ashraf, caffe...@googlegroups.com
If the 20 classes are a subset of the 40, the simplest approach would
be to fine-tune the original 40-class network using the 20-class data,
assuming the class ids matched up (e.g. class 7 of the 20-class
dataset was also called class 7 in the 40-class dataset).

If the 20 classes are not a subset of the 40, then you would need to
create a new layer that outputs 20 values instead of 40. This can be
done in the *_train_val.prototxt file like so:

Section from old file:
layers {
layer {
name: "fc8"
type: "innerproduct"
num_output: 40
...
bottom: "fc7"
top: "fc8"
}

Section from new file:
layers {
layer {
name: "fc8-new"
type: "innerproduct"
num_output: 20
...
bottom: "fc7"
top: "fc8-new"
}

Then use the following command to create and train a new network where
the first N-1 layers are copied from the 40-class network (because the
layer names match) and the last layer is randomly initialized (because
there is no layer named fc8-new in the old net):

./caffe.bin train --weights=trained_40_class_weights_iter_xxxxx
--solver=solver.prototxt

You can set the learning rates of the lower N-1 layers to 0 to train
only the last 20-ouput layer (faster training, more likely to
underfit), or you can set them to a non-zero value to allow all layers
to be fine-tuned (slower, more likely to overfit).

Hope that helps!
jason

---------------------------
Jason Yosinski, Cornell Computer Science Ph.D. student
http://yosinski.com/ +1.719.440.1357


> https://groups.google.com/d/msgid/caffe-users/953a147c-45c0-4ae9-9208-08f153ebd9eb%40googlegroups.com.

Khalid Ashraf

unread,
Sep 10, 2014, 9:46:01 PM9/10/14
to caffe...@googlegroups.com, kal.a...@gmail.com
Thanks very much Jason, that was helpful. 

zhe wang

unread,
Oct 8, 2014, 12:30:41 PM10/8/14
to caffe...@googlegroups.com, kal.a...@gmail.com
I think you can also directly set 40 to 20
Section from old file: 
layers { 
  layer { 
    name: "fc8" 
    type: "innerproduct" 
    num_output: 40 
    ... 
  bottom: "fc7" 
  top: "fc8" 


Section from new file: 
layers { 
  layer { 
    name: "fc8" 
    type: "innerproduct" 
    num_output: 20 
    ... 
  bottom: "fc7" 
  top: "fc8" 

for it is some kind of random initialization

在 2014年9月11日星期四UTC+8上午8时49分19秒,Jason Yosinski写道:

Evan Shelhamer

unread,
Oct 8, 2014, 12:52:21 PM10/8/14
to zhe wang, caffe...@googlegroups.com, Khalid Ashraf
You need to change the layer name too to not transfer weights from the old model and trigger random initialization, as described in the Flickr fine-tuning tutorial.

Evan Shelhamer

Shravani Rao

unread,
Apr 29, 2015, 5:24:33 PM4/29/15
to caffe...@googlegroups.com, ja...@yosinski.com, kal.a...@gmail.com, buptwan...@gmail.com, shel...@eecs.berkeley.edu
Hi,

I am trying to fine-tune the pretrained bvlc imagenet model to classify images in 11 categories(not related to the original 1000 categories) . For this I am using around 60 training images for each category ( a total of 700 images). 
I followed the flickr style tutorial - just renamed the last layer from 'fc8' to 'fc8_new' and num_output to 11. 

When I test the prediction with any image (even those used for training) , it always results in class no. 4.  I understand that the training images were too less, but in that case i'd expect a wrong output. What intrigues me is why is the output always constant. 

While training, the test net output accuracy for every 1000 iterations changed like 

0.116
0.0936
0.0927
0.3028
0.093
0.0933
0.0927
0.0928
0.0935
;
;remains around 0.09ish

;
0.092799

I have also attached my train_val.prototxt file. 

I have been trying to understand where I went wrong but no luck so far.  Could anyone of you please guide me on how I should approach my objective. Any help is highly appreciated. 

Thank you,
Shravani
train_val.prototxt.txt

npit

unread,
Apr 30, 2015, 4:15:18 AM4/30/15
to caffe...@googlegroups.com, kal.a...@gmail.com, shel...@eecs.berkeley.edu, buptwan...@gmail.com, ja...@yosinski.com
Are you certain that you are using memory_data_param correctly?
Try to using data or image_data input and see if you're getting the same weird results.
Reply all
Reply to author
Forward
0 new messages