I would like to train a net such as vgg16 using image-level labels, then finetune using the fcn method (change fc layers to convolutional, etc) .
In my case the image-level labels are multi-class (several items may be present in a given image) - will this cause trouble when I try to finetune given that the pixel-level output is single-class?
I can prepare a set of image with single-class per image but it will take some effort. I guess the basic question is 'how different' a multi-class net will be from a single-class net if they are predicting the same items. Will the differences be restricted to the final layer or will they propagate back.