When fine-tuning a neural network, which mean image should we subtract?

1,093 views
Skip to first unread message

Michael Wilber

unread,
Feb 16, 2015, 4:08:08 PM2/16/15
to caffe...@googlegroups.com
Hello Caffe users! I have a philosophical question.

I'm fine-tuning the provided 'network-in-network' model for a separate dataset of images of my own choosing. I computed the mean image of this new dataset and changed the "data" layer to match the new dataset *and* its mean file.

When I used the Python bindings to get features out (and thus get the nice 10-crops-averaging/ensemble) behavior, I forgot to subtract the proper mean. I accidentally subtracted the Imagenet mean because that was the default parameter. It worked well enough (~71% accuracy), but when I change to my own dataset's mean (what the network was fine-tuned with), performance improved to about 73%. Incidentally, this is a few % higher than the output of Caffe's testing interval, likely because of the cropping/ensembling behavior.

My question is: Should I have changed the image mean when fine-tuning/training the network in the first place? Since "most" of the model was trained on imagenet mean and not my own dataset's mean, perhaps it could give better performance.

Does anyone have some "in-the-trenches" suggestions they can share?

user1979

unread,
Nov 11, 2015, 12:41:53 PM11/11/15
to Caffe Users
HI Michael,

What was your experience with this?

Thanks!

Emmanuel Benazera

unread,
Nov 11, 2015, 2:20:56 PM11/11/15
to Caffe Users
Hi Michael,

I always use the current dataset mean image for training, i.e. in your case the one from the dataset you are finetuning for, and it works well. My hunch is that the goodness of this may vary depending on whether and how much you are allowing the intermediary layers to change during the finetuning phase.

Em.

ath...@ualberta.ca

unread,
Nov 17, 2015, 2:03:03 PM11/17/15
to Caffe Users
Fine-tuning a dataset implies that "most" of the model (as you say and I agree) has been trained on original images with original means subtracted and that it will be tweaked (tuned) toward a new dataset. From this viewpoint, it makes sense that one would use the original mean especially if the fine-tune data is significantly smaller in quantity compared to original set (as it would not affect the combined mean much anyway in this case).

One the other hand, one can argue that the point of fine-tuning is not to tweak a bit toward the new dataset but rather to model as best as possible the new dataset without any concern for the original one. In other words, the "fine" in "fine-tuning" comes mainly from the fact that it is much easier to tune than train from scratch and not because the intent is somehow to respect the original dataset. From this viewpoint, consider the pre-trained weights as just an initialization where the model should be moved toward the new dataset's mean image (tuning may benefit from trying to model new data that has been normalized to the new data's mean).

So I agree with Emmanuel that the answer will depend on how many layers you are tuning (just the top one? all the layers?). Other factors include the relative quantity of the new data vs original, the relative image mean of the new data vs the original and perhaps even the nature of the new vs old data (original trained on cars and fine-tuning with cats). So I don't think there is a single answer in general for this question but these are some things to consider.

Reply all
Reply to author
Forward
0 new messages