How to do fine-tuning properly? (weights vs model vs snapshot!)

Hossein Hasanpour

unread,

Aug 2, 2016, 1:35:07 PM8/2/16

to Caffe Users

Hello every one, for fine tuning there are several ways that I have seen people do it.

one example is given in the fliker style fintuning example where the command is used like this :

1) so here it uses weights along with a solver to do fine tuning .

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0

2) another example is

./build/tools/caffe train -solver=models/finetune_flickr_style/solver.prototxt -model=models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel.h5 -gpu 0

3) and

./build/tools/caffe train -solver=models/finetune_flickr_style/solver.prototxt -model=models/bvlc_reference_caffenet/bvlc_reference_caffenet.solverstate.h5 -gpu 0

4) and finally

./build/tools/caffe train -solver=models/finetune_flickr_style/solver.prototxt -snapshot=models/bvlc_reference_caffenet/bvlc_reference_caffenet.solverstate.h5 -gpu 0

5) and in Stanfords video lecture

./build/tools/caffe train -gpu 0 -model models/finetune_flickr_style/trainval.prototxt -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

I myself have been using the options 3 and 4 for fine-tunning and resuming(4). but that really got me curious what the difference is among these different methods.

Any explanation in this regard is appreciated

Jeremy Rutman

unread,

Aug 3, 2016, 4:41:01 AM8/3/16

to Caffe Users

I second this question!! what's the diff. between resuming (caffe train -solverstate x.solverstate) and finetuning (caffe train -weights x.caffemodel)

I think -model (caffe test -model x.caffemodel, caffe time -model x.caffemodel) is used for testing/timing as per the examples here

Hossein Hasanpour

unread,

Aug 3, 2016, 7:04:16 AM8/3/16

to Caffe Users

I know about the testing and time switches, thats fine. but the irony is if the model switch was to be used with the model prototxt in the train,

and with the caffemodels in test and time switches, why is it still working with train?

maybe some backward compatibility stuff going on here ?

Misa

unread,

Aug 4, 2016, 4:28:25 AM8/4/16

to Caffe Users

Hm, yes this is really interesting question.
In my opinion the different between fine-tuning and resuming is that in resuming you use caffemodel you trained to make your training better (so you use your last solverstate) and in fine-tuning you use other elses net and trained caffemodel, you modified this net as you wish and then you train weights in this caffemodel for your classification (in this case you are using other elses caffemodel weights to fit your modified net model).

Hossein Hasanpour

unread,

Aug 4, 2016, 5:28:53 AM8/4/16

to Caffe Users

My question was not about resuming vs fine-tuning.

The notion of resuming is fine, you save a snapshot, and stop training for some reason, then some time later, you resume your training.

fine-tuning how-ever means, you take a pretrained model, and then do your changes to the architecture, (be it network architecture with or without optimization parameters) and start training.

I noticed when someone tries to use snapshot for resuming and also changing network architecture, the new layers are initialized, and training continues from where it was stopped. the user however can not change the optimization parameters! for example I noticed if you previously trained your model with rmsprop, then saved a snapshot, and then tried to resume it with a new optimization configuration, lets say instead of rmsprop, you used ada delta this time, it will crash!, however, introducing new layers, is fine here!

This problem does not happen when you use the model switch with either caffemodel or solverstate! when you use model, the training starts from the very beginning.(iteration 0) and adding/removing/ layers, changing solver settings, all are fine.

I guess the weight switch was exactly like the model's switch.

So the question is while model acts exactly like weight as it seems. and with snapshot, the same thing (except solver settings changes) is true, what is the difference between all of these switches? and

what is the significance of each of these against the others ?

Reply all

Reply to author

Forward